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INTRODUCTION 

This  thesis  describes  an  experimental  information  retrieval 
system,  EUREKA,  designed  and  implemented  at  the  University  of 
Illinois  by  a  research  group  under  the  direction  of  Dr.   D.J.   Kuck. 

The  EUREKA  system  was  designed  to  provide  a  test  system  for 
studying  several  interesting  problems  in  information  retrieval,  file 
manipulation,  and  large  database  systems.  Whereas  most  current 
information  retrieval  systems  are  based  on  the  use  of  predefined 
index  terms  for  the  retrieval  of  abstracts  or  titles  of  documents, 
EUREKA  is  organized  around  a  database  containing  the  entire  text  of 
documents,  with  each  document  indexed  under  every  word  occurring  in 
the  document  (often  referred  to  as  inverted  file  organization) . 
Beyond  this  assumption  of  file  organization  and  content,  the 
structure  of  EUREKA  has  been  kept  very  modular  in  order  to  faciliate 
generation  of  variant  systems  for  comparison  of  various  methods  of 
handling  this  type  of  data. 

Some  of  the  topics  currently  being  studied  include: 

1)  The  effect  of  guery  language  features  on  user  performance; 

2)  Analysis  of  bottlenecks  in  information  flow  within  the  system; 

3)  Comparison  of  various   levels   of   indexing,   i.e.    whether   the 
inverted   file   should   contain  postings  lists  for  documents  or 


for  paragraphs  within  documents; 

4)  Effects  of  tradeoffs  between   the   use  of   indexing   to   various 

levels  and  the  use  of  full-text  searching; 

5)  Design  and  analysis   of   special   purpose   hardware   for   use   in 

information  retrieval  systems  of  this  type; 

6)  Methods  for  handling  non-textual  information,  such  as  tables  and 

graphic  items; 

7)  Development  of  automatic  user  aids  to   augment   and   improve   the 

user's  performance  and  recall/precision  ratio; 

8)  Analysis  of   user   performance   on   various   types   of  retrieval 

problems  using  a  variety  of  combinations  of  the  aforementioned 
modifications  of  the  basic  EUREKA  system  to  weigh  the  benefits 
of  these  modifications  under  different  types  of  retrieval 
demands; 

9)  Collection   of   data    for   use   in    simulation    studies   of 

hardware/software  systems  for  performing  the  data  manipulation 
operations  inherent  in  this  type  of  system. 

One  can  easily  see  from  the  above  list  of  current  research 
problems  that  the  basic  EUREKA  system  must  be  modified  by  various 
researchers  in  the  process  of  their  studies.  This  thesis  is 
intended  to  provide  the  basic  information  necessary  for  performing 
these  modifications.  Chapter  Two  consists  of  the  User's  Guide  to 
EUREKA,   presenting  the   end  user's  view  of  the  system  currently  in 


use  for  studies  of  user  performance.  Chapter  Three  is  the  system 
documentation  for  the  EUREKA  system.  This  system  documentation 
includes  descriptions  of  most  of  the  EUREKA  information  structures 
and  the  routines  that  manipulate  them. 


2 
USER»S  GUIDE 

2.  1  INTRODUCTION 

EUREKA  is  an  experimental  information  retrieval  system 
based  on  full-text  searching  being  constructed  by  a  research 
qroup  under  the  direction  of  Dr.  David  Kuck  at  the  University 
of  Illinois. 

Since  EUREKA  is  experimental  in  nature,  it  has  been 
developed  with  ease  of  implementation,  measurement,  and 
modification  occasionally  talcing  precedence  over  ease  of  use. 

One  of  the  primary  design  goals  of  this  project  is  to 
det.prmine  what  features  are  necessary  and/or  desirable  from  a 
user's  viewpoint.  Our  current  query  language  is  an  attempt  to 
provide  a  basic  set  of  tools  to  the  user  in  order  that  he/she 
may  begin  using  the  system  and  we  may  begin  the  process  of 
monitoring  and  improving  both  the  query  language  and  the 
system.  Hopefully,  this  will  eventually  lead  to  a  better 
understanding  of  the  man-machine  interface  problem,  and  hence, 
a  better  query  language  and  system. 


As  one  of  the  primary  functions  of  EUREKA  is  to  provide 
information  about  system  use  of  hardware/software  resources 
and  user  use  of  the  system,  all  users  and  all  processes  will 
be  monitored  extensively.  From  this  information  and  from  user 
interviews  and  suggestions  we  hope  to  obtain  a  fairly  clear 
view  of  what,  users  expect  from  an  information  retrieval  system 
and  how  best  to  provide  for  their  needs. 

Access  to  the  system  will  be  virtually  unlimited,  at 
least  initially.  User  codes  will  be  assigned  so  that  each 
user  may  maintain  private  files  on  disk  between  sessions. 

This  manual  is  intended  to  serve  as  a  guide  for  the 
inexperienced  user  and  is  therefore  more  verbose  (and 
hopefully  more  helpful)  than  the  usual  user's  guide  to  a 
system.  Users  with  experience  on  other  time-sharing  or 
information  retrieval  systems  may  wish  to  merely  skim  the  bulk 
of  this  guide,  studying  the  examples  and  command  definitions 
without  spending  too  much  time  on  the  explanations. 

2. 2  WH AT  DOES  EUREKA  DO? 

EUREKA  is  a  tool  for  use  by  anyone  desiring  to  find 
specific  information  from  a  set  of  documents.  More 
specifically,  it  allows  a  person  desiring  to  find  documents, 
authors,   or  even   sentences   within   a   document,   that  will 


satisfy  some  set  of  restrictions  specified  by  the  user.  A 
session  at  a  terminal  using  EUREKA  is  equivalent  to  using  a 
card  catalog  to  find  documents  which  might  be  of  interest, 
then  finding  the  documents  in  the  stacks,  selectively  scanning 
through  the  documents  to  determine  actual  usefulness  and  find 
more  possible  reference  terms  under  which  useful  information 
is  likely  to  be  found,  and  repeating  this  process  for  the  new 
search  terms  until  the  required  information  is  found.  While 
this  process  might  consume  one  or  more  days  of  the  user's  time 
if  he/she  were  to  conduct  the  search  in  person  in  a  library, 
by  using  EUREKA  he/she  might  well  be  able  to  accomplish 
his/her  goals  in  a  matter  of  minutes  or  an  hour  or  two  at 
most. 

EUREKA  accomplishes  this  by  doing  most  of  the  searching, 
retrieving,  and  record  keeping,  allowing  the  user  to 
concentrate  on  the  intellectual  aspects  of  the  search.  EUREKA 
allows  the  user  to  enter  a  search  request  consisting  of  a 
group  of  words  the  user  feels  characterize  the  information  for 
which  he/she  is  searching.  This  is  equivalent  to  the  user 
searching  the  card  catalog  for  entries  under  those  terms. 
Since  EUREKA  is  a  full-text  searching  system,  every  document 
is  indexed  under  every  word  contained  within  the  text  of  that 
document,  rather  than  under  only  a  few  index  terms  as  in  a 
card  catalog.   This  allows   the   user   much   more   freedom   in 


selecting  search  terms  and  in  forming  his/her  search  strategy. 
Another  feature  of  EUREKA  not  feasible  in  a  card  catalog  is 
allowing  the  user  to  specify  the  context  within  which  the 
search  terms  must  occur-  In  EUREKA  a  user  may  specify  that 
several  words  must  occur  within  the  same  sentence  or 
paragraph,  etc.,  rather  than  merely  occurring  anywhere  in  the 
document.  EUREKA  thus  frees  the  user  from  many  of  the 
trivialities  of  searching  through  the  card  catalog  trays, 
worrying  about  alphabetical  sequence,  trying  to  find  terms  in 
a  strictly  controlled  index  vocabulary  that  adequately 
describe  his/her  reguest,  and  locating  the  document  in  the 
stacks.  It  allows  the  user  to  see  the  results  of  his/her 
search  strategy  immediately,  without  having  to  leave  the  place 
where  he/she  conducts  his/her  search  to  locate  the  documents 
before  determining  its  relevance.  Once  the  document  has  been 
retrieved  by  EUREKA,  the  user  may  use  EUREKA  to  view  any 
portion  of  the  document  on-line  and  selectively  print  any 
portion  of  the  document  (up  to  and  including  the  entire 
document)  on-line.  This  also  allows  the  user  to  find  new 
search  terms  and  evaluate  and  modify  his/her  search  strategy 
accordingly  without  a  tedious  and  time-consuming  search 
session  in  the  stacks. 


8 

Another  useful  feature  of  EUREKA  is  the  record  keeping 
function  it  performs  for  the  user.  A  user  is  allowed  to  keep 
a  record  of  all  of  the  searches  he/she  has  already  completed 
and  the  results  of  these  searches.  The  user  may  also  attach 
comments  to  these  documents  and  search  results  that  he/she  may 
view  later  on-line  to  assist  in  keeping  track  of  what  he/she 
has  already  done.  Various  other  aids  exist  for  assisting  the 
user  in  fulfilling  his/her  information  reguirements  with  a 
minimum  of  effort. 

2.3  HOW  DO  I  USE  EUREKA? 

Using  EUREKA  is  relatively  simple.  EUREKA  functions 
primarily  by  determining  whether  or  not  certain  words  appear 
in  a  document.  Thus,  by  using  a  very  simple  set  of  commands, 
the  user  may  direct  the  system  to  find  all  documents 
containing  some  combination  of  words  he/she  feels  will 
characterize  documents  in  which  he/she  is  interested.  By 
specifying  various  options  (as  will  be  explained  in  a  later 
section)  the  user  may  view  selected  sentences,  paragraphs, 
titles,  etc.  from  the  documents  retrieved  by  EUREKA.  Whether 
or  not  any  options  are  selected,  the  system  will  respond  with 
a  list  of  all  documents  containing  the  words  or  phrases 
specified  by  the  user  and  the  relative  rank  of  each  document. 
This  relative  rank  is  computed   for   each   document   from   the 


number  of  occurrences  of  each  search  term  in  the  document. 
The  document  containing  the  largest  total  number  of 
occurrences  of  search  terms  is  ranked  number  one,  the  next 
largest  number  two,,..,  down  to  the  document  containing  the 
least  total  number  of  occurrences  of  all  the  search  terms.  At 
this  point  the  user  may  use  other  commands  to  view  portions  of 
the  documents  just  retrieved  on-line.  By  doing  so  he/she  may 
evaluate  the  actual  relevance  of  each  document,  and  may  also 
find  new  search  terms  to  use  in  searching  for  more  documents. 

The  list  of  all  documents  retrieved  by  a  guery  statement 
is  saved  by  the  system  for  later  use  by  the  user.  After 
conducting  several  such  searches,  thus  generating  several 
document  lists  (called  guery  sets) ,  the  user  may  use  other 
commands  in  the  guery  language  to  compare  and  combine  the 
guery  sets  to  generate  new  lists  of  documents  that,  meet 
his/her  reguirements  more  exactly. 

For  example,  consider  a  legal  secretary  searching  for 
legal  statutes  pertaining  to  roads  near  cheese  factories, but 
not  pertaining  to  interstate  highways.  If  this  user  were  to 
use  EUREKA  to  search  the  State  Statutes  data  base,  a  possible 
search  strategy  would  be: 

1)  Search  for  all  documents   (actually  chapters   of   the 
state   Statutes)    containing   the   words   "CHEESE"   and 
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"FACTORY"  in  the  same  sentence. 

2)  Search  for  all  documents  containing  the  word  "ROADS", 

3)  Search  for  all  documents  containing  both  the  words 
"INTERSTATE"  and  "HIGHWAYS"  within  the  sane  sentence. 

At  this  point  the  searcher  has  three  lists  of  docuaents  (guery 
sets)  that  he/she  may  coapare  and  combine: 

4)  Compare  the  list  of  documents  responding  to  guery  #1 
to  the  set  of  docuaents  responding  to  guery  #2,  selecting 
the  documents  that  appear  in  both  lists.  This  gives  us  a 
set  of  documents  pertaining  to  both  cheese  factories  and 
roads. 

5)  Coapare  the  list  of  documents  generated  in  step  4 
(guery  set  t4)  to  guery  set  #3,  selecting  only  those 
documents  appearing  in  guery  set  #4  but  not  in  guery  set 
#3.  This  eliminates  any  documents  referring  to 
interstate  highways. 

The  user  may  then  use  other  commands  of  the  system  to 
view  portions  of  the  documents  (or  have  them  printed  on  a  line 
printer)  and  discover  that  cheese  factories  may  not  be  located 
within  four  hundred  feet  of  a  dirt  road.  If  satisfied  the 
user  may  log  off,  or  if  not,  continue  his/her  search  using  new 
search  terms. 


11 

The  EUREKA  commands  to  perform   all   of   the   preceedinq 
actions  would  be  : 

LOGON  ANYBDY 

FIND  •CHEESE'  *  • FACTORY'  IN  SENTENCE  =  CHEZFAC 

FIND  'ROADS'  FROM  ALL  =  ROADLIST 

FIND  'INTERSTATE'  *  'HIGHWAYS'  FROM  ALL  =  TURNPIKES 

MAKE  CHEZFAC  *  ROADLIST  =  TEMPLIST 

MAKE  TEMPLIST  -  TURNPIKES  =  FINALLIST 
All  of  the  above  commands  will  be  explained   in   detail   in   a 
later  section. 

iii*  PRELIMINARIES  TO  USING  EUREKA 

First  let  us  define  some  terms: 
DOCUMENT: 

One  logical  division  of  text  that  is  given  a  document 
number  and  is  indexed  by  every  word  that  occurs  within 
the  bounds  of  that  division-  Size  and  logical  division 
may  vary  between  data  bases.  In  the  information 
retrieval  data  base  one  journal  article  is  taken  to  be 
one  document,  while  in  the  State  Statutes  data  base  one 
chapter  of  the  statutes  is  taken  to  be  a  document.  In 
the  business  abstracts  data  base,  each  abstract  is  a 
document.  When  more  data  bases  are  added  ad  hoc 
divisions  will  be  determined  for  them. 
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DOCUMENT  NUMBER: 

A  number  arbitrarily  assigned  to  each  document  by  EUREKA 
so  that  unnecessarily  long  document  names  do  not  have  to 
be  remembered  and  handled  by  the  user  or  the  system. 

QUERY: 

A  command  to  EUREKA  in  the  EUREKA  query  language. 
Usually  a  search  command  (FIND  statement)  or  a  guery  set 
manipulation  command  (MAKE  statement)  that  generates  a 
guery  set. 

QUERY  NUMBER: 

Each  guery  entered  by  the  user  is  automatically  assigned 
a  number  (which  is  printed  out  each  time  EUREKA  notifies 
the  user  of  its  readiness  to  accept  a  new  command)  by 
which  the  user  may  refer  to  the  results  of  that  guery  or 
to  any  comments  attached  to  that  guery  by  the  user. 

QUERY  SET: 

The  list  of  documents  retrieved  by  a  guery.  This  guery 
set  may  be  referred  to  by  guery  number  or  by  a  user 
assigned  guery  set  name  at  a  later  time  for  use  in  other 
gueries  or  PRINT  statements. 

SEARCH  SETS: 

It  is  assumed  that  the  user  of  EUREKA  will  normally  wish 
to  conduct  his/her  searches  in  gradual  steps  rather  than 
by  one  gigantic,  complicated  guery  statement.    In   most 
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cases  this  will  involve  conducting  a  search  on  fairly 
general  search  terms  first  and  then  narrowing  down  the 
number  of  documents  retrieved  by  searching  just  those 
documents  that  responded  to  the  first  guery  on  yet  other 
search  terms.  To  facilitate  this  mode  of  operation 
EUREKA  allows  the  user  to  specify  any  query  set  (or 
combination  thereof)  as  the  set  of  documents  to  be 
searched  by  a  new  search  statement  (FIND  statement) . 
This  set  of  documents  from  which  the  search  is  to  be 
conducted  is  referred  to  as  the  search  set.  The  search 
set  is  specified  by  the  inclusion  of  an  optional  "from" 
clause  on  the  query  language  statement  that  specifies  the 
search  to  be  performed  (FIND  statement) .  If  no  search 
set  is  specified  by  the  user,  "FROM  ALL"  is  assumed. 
That  is,  we  assume  that  the  user  wishes  to  search  from 
the  entire  data  base  rather  than  any  restricted  set. 
CONTEXT: 

Each  document  is  divided  into  several  parts,  such  as 
title,  author,  body,  abstract,  sentence,  etc.  Each  of 
these  divisions  is  referred  to  as  a  context.  Note  that 
the  contexts  "sentence"  and  "paragraph"  may  occur 
arbitrarily  many  times  in  each  document,  but  all  other 
contexts  occur  only  once  per  document. 
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2^.5  SEARCH  S£TSX  QUERY  SETSj.  AND  QUERY  NUMBERS  -  AN  EXAMPLE 

Let   us  consider   a   person   searching   the   information 

retrieval  data  base  for  information  concerning  the  application 

of   formal   languages  and   automata   theory   to   information 

retrieval.   After  logging  into  EUREKA  (to  be  explained  later) , 

EUREKA  would  type  out: 

QUERY  #00001: 
# 

This  is  the  signal  to  the  user  that  EUREKA   is   ready   to 

accept   guery   number  one  (notice  that  1  is  the  guery  number) . 

One  possible  search  strategy  is  to  first  search  on   the   terms 

"FORMAL"   and   "LANGUAGE",   reguesting  a  list  of  all  documents 

that  have  one  or  more  occurrences   of   both  search  terms   in 

them.    The   list   of   documents   that  EUREKA  responds  with  is 

called  the  guery  set.   As  soon  as  EUREKA  completes  typing  out 

the   list   of  documents  containing  both  the  words  "FORMAL"  and 

"LANGUAGE",  it  will  advance  to  a  new  line  and  type: 

QUERY  #00002: 

# 

The  user  could  then  enter  query  #2  which  would  consist  of 
a  search  statement  asking  for  a  list  of  documents  containing 
the  word  "AUTOMATA".  If  the  user  specifies  "PROM  LAST"  EUREKA 
will  assume  that  he  wants  to  search  only  those  documents  that 
are  members  of  guery  set  #1  (that  is,  documents  that  contain 
both   the   word   "FORMAL"   and  the  word  "LANGUAGE").   The  user 
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could  alternatively  specify  that  the  entire  data  base  was  to 
be  searched  by  not  specifying  any  "from"  clause.  This  second 
query  statement  would  generate  a  new  list  of  documents  which 
would  be  query  set  #2.  Both  query  set  #1  and  query  set  #2  can 
now  be  referred  to  by  later  "from"  clauses  in  determining  a 
search  set. 

2^.6  TERMINAL  OPERATION 

The  terminal  currently  in  use  with  EUREKA  is  an  Infoton 
CRT  terminal.  The  keyboard  is  very  similar  to  an  electric 
typewriter.  There  are  no  lower  case  letters.  Since  the  shift 
key  no  longer  is  used  to  supply  capital  letters  instead  of 
lower  case,  it  is  used  on  the  terminal  keyboard  to  supply 
special  characters  (such  as  ,<,>,",', and  [).  These  special 
characters  are,  for  the  most  part,  printed  above  the  letters 
on  the  terminal  keyboard. 

One  other  point  about  keys  should  be  noted.  On  a 
terminal  keyboard  zero  and  the  letter  "0"  are  not  the  same. 
The  zero  key  is  between  the  "9"  key  and  the  ":"  key  on  the  top 
row  of  keys,  while  the  "0"  key  is  between  the  "I"  and  "P"  keys 
on  the  next  row  down.  Also  note  that  there  are  both  single 
guote  and  double  quote  keys,  the  double  quote  key  being 
SHIFT-2  and  the  single  quote  key  being  SHIFT-7.   EUREKA   makes 
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use  of  both  single  and  double  quotes,  so  do  not  attempt  to  use 
two  sinqle  quotes  where  one  double  quote  is  called  for. 

The  only  other  special  keys  the  user  need  be  aware  of  are 
the  CTRL  key,  the  RETURN  key,  and  the  RUB  OUT  key.  The  RETURN 
key  is  used  to  cause  a  carriage  return  and  to  cause  the  line 
of  characters  you  have  just  typed  on  the  keyboard  to  be  sent 
to  EUREKA.  Hhile  you  type  the  terminal  holds  what  you  type  in 
a  temporary  storage  area  until  you  push  the  RETURN  key,  at 
which  time  the  entire  line  is  sent  to  EUREKA.  If,  while  you 
are  typinq  in  a  ccmaand  to  EUREKA,  you  discover  that  you  have 
-just  hit  the  wronq  key,  you  may  delete  the  last  letter  you 
typed  in  by  hittinq  the  RUB  OUT  key.  Pushinq  the  RUB  OUT  key 
twice  deletes  the  last  two  characters,  etc.  The  CTRL  key  is 
similar  to  the  SHIFT  key  in  that  it  assiqns  a  new  set  of 
meaninqs  to  other  keys  on  the  keyboard.  If,  while  typinq  in  a 
line  you  notice  a  mistake  back  in  the  first  part  of  the  line 
and  don*t  want  to  use  the  RUB  OUT  key  to  rub  out  the  entire 
line  back  to  that  point,  you  may  delete  the  entire  line  by 
holdinq  down  the  CTRL  key  and  pushinq  the  "U"  key. 

The  user  should  note  that  EUREKA  can  send  and  receive 
commands  at  the  same  time,  even  thouqh  commands  typed  in  while 
EUREKA  is  workinq  on  somethinq  else  are  not  echoed  on  the 
terminal   until   EUREKA   finishes   whatever  it  was  workinq  on. 
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Therefore,  it.  is  unwise  to  play  with  the  keyboard  while  EUREKA 
is  working  on  a  command  you  have  just  entered  since  whatever 
you  type  in  will  eventually  be  sent  to  EUREKA. 

In  order  to  prevent  information  being  flashed  on  the 
screen  faster  than  you  can  read  it,  EUREKA  pauses  every  15 
lines  or  at  the  end  of  each  item  you  have  requested  to  have 
printed,  whichever  occurs  first.  If  the  line  count  has  been 
reached,  EUREKA  will  print  an  nd"  on  the  next  line  of  the 
screen  to  inform  you  of  this  fact.  If  the  end  of  an  item  has 
been  reached,  EUREKA  uses  an  "!"  to  signal  you.  When 
signalled  by  either  an  "»M  or  an  "!M,  you  may  then  instruct 
EUREKA  to  continue  with  the  current  output  by  pushing  the 
carriage  return  key,  stop  the  current  output  (the  current 
search  contiues,  however)  by  pressing  the  "K"  key  followed  by 
a  carriage  return,  skip  to  the  next  document  in  the  list  to  be 
printed  by  pressing  "S"  followed  by  a  carriage  return,  or 
enter  the  browse  mode  to  skip  around  within  the  document 
currently  being  printed. 

Browse  Mode 

Browse  mode  is  merely  a  method  of  viewing  sentences  or 
paragraphs  adjacent  to  the  one  current).y  being  printed.  To 
skip  back  to  the  previous  sentence,  type  in  "-S"  and  push  the 
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carriaqe  return  key.  Similarly,  the  previous  paragraph  may  be 
viewed  by  typing  n-PM.  To  skip  forward  one  sentence  or 
paragraph,  type  "+S"  or  "♦p«.  If,  while  viewing  portions  of  a 
document,  you  decide  to  look  at  the  title  or  author  (or  any 
other  context) ,  you  have  only  to  type  in  the  correct  context 
identifier  (see  Appendix  A)  and  push  carriage  return  when 
EUREKA  has  paused  to  allow  you  to  respond.  Once  any  of  the 
commands  for  skipping  around  within  a  document  or  printing 
other  contexts  have  been  used  after  a  "i"  or  H!"  flag,  the 
user  is  said  to  be  in  browse  mode.  In  order  to  get  out  of 
browse  mode  and  resume  the  output  from  the  current  document, 
type  in  "E"  followed  by  a  carriage  return. 

ZsJ.   IHE  OJE.M  LANGUAGE 

There   are  only   nine   commands   in   the   EUREKA    guery 
language.    Only   two   of   these  are  necesssary  for  conducting 
searches,  while  the  other  seven  perform   auxiliary   functions. 
In  brief,  the  functions  of  these  commands  are: 
FIND: 

The  FIND  statement  is  the  heart  of  the  EUREKA  system.   It 

is   used   to   perform  searches  for  documents  containing  a 

user  selected  set  of  words  or  phrases. 
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HAKE: 

The  MAKE  statement  is  used  to  compare  and  combine  sets  of 
documents  created  by  the  FIND  statement. 

COMMENT: 

The  COMMENT  statement  is  used  to  write  notes  to  yourself 
concerning  a  query  set  or  particular  document.  These 
notes  may  be  retrieved  at  a  later  time  by  use  of  the 
PRINT  statement. 

CHANGE: 

The  CHANGE  statement  is  used  to  assiqn  a  name  to  a  query 
set  or  to  chanqe  the  existinq  name  of  a  query  set. 

DELETE: 

The  DELETE  statement  is  used  to  delete  query  sets, 
macros,  and/or  comments  which  are  no  lonqer  needed. 

PRINT: 

The  PRINT  statement  is  used  to  print  user  comments  , 
selected  portions  of  a  document  (up  to  and  includinq  the 
entire  document) ,  and  information  about  preceedinq 
queries  and  their  resultant  query  sets. 

LOGON: 

The  LOGON  statement  is  used  to  identify  the  user  to 
EUREKA  in  order  for  EUREKA  to  qain  access  to  the  correct 
user  files  and  data  base. 


20 

LOGOFP: 

The  LOGOFF  command  is  used  to  terminate   a   session.    It 
disconnects   a  user  from  the  EUREKA  system  and  closes  his 
files. 
DEFINE: 

The  DEFINE  statement  is  used  to  give  a  name  to  a  list   of 

search  terms  so  that  the  user  does  not  have  to  repeatedly 

type  in  long  search  expressions.   These  macro  definitions 

are   saved   in   the   user   file   area   and  may  be  used  in 

conjunction  with  other  search  terms  in  FIND  statements. 

Each  of  the  guery  language  commands  has  a   very   simple   basic 

form   which  may  be  used  alone  or  with  the  addition  of  optional 

clauses  that  significantly  increase  their  power.   This   allows 

the   user   to   begin   with   very   simple   guery  statements  and 

progress  to  more  complicated  forms  when  the  need  arises. 

EUREKA  is  a  keyword  driven  language.  This  means  that 
EUREKA  figures  out  what  a  command  typed  in  by  the  user  means 
by  looking  for  special  words  that  tell  it  to  do  a  specific 
operation.  These  keywords  are  usually  followed  by  one  or  more 
user  supplied  parameters  that  control  how  the  operation  is 
performed. 
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Although  this  sounds  somewhat  complicated,  it  is  pretty 
simple.  Algebra  is  one  other  example  of  a  keyword  driven 
language.   In  the  algebraic  expression: 

A  *  3X  ♦  5 
the  equal  sign  "=M  is  a  keyword  to  tell  anyone  looking  at  it 
that  whatever  appears  on  the  right  side  of  it  is  equal  to 
whatever  appears  on  the  left.  Similarly,  the  plus  sign,  "♦", 
is  a  keyword  for  the  add  operation,  while  A,3,X,  and  5  are 
parameters. 


In  discussing  the  EUREKA  query  language  we  will  wish  to 
represent  parameters  in  a  general  fashion  in  addition  to 
qiving  specific  examples,  since  it  would  take  an  impossibly 
larqe  number  of  examples  to  cover  the  possible  ranqe  of  each 
command  entirely.  Therefore,  if  we  wish  to  describe  a 
parameter,  we  will  -just  use  an  Enqlish  phrase  that  describes 
the  parameter  and  then  describe  allowable  forms  for  the 
parameter  by  giving  a  definition  for  the  phrase.  However, 
this  causes  a  slight  problem.  Since  most  of  the  EUREKA 
keywords  are  English  words,  we  need  something  to  distinguish 
the  EUREKA  keywords  from  the  English  phrases!  For  this 
purpose  we  will  use  some  characters  not  used  in  the  EUREKA 
guery  language  to  enclose  the  English  phrases.  Since  neither 
the  less-than  symbol  (<)  nor  the  greater-than  symbol  (>)  is 
used  in  the  EUREKA  language,  we  will  use  them   for   separating 
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the   phrases   from   the   keywords.   Por  example,  the  algebraic 
expression  used  earlier  could  be  represented  by: 

<VARIABLE>  =  <FORNULA> 
where  <FORMULA>  is  of  the  form: 

<DIGITXVARIABLE>  ♦  <DIGIT> 
and  so  forth. 

The  notation  just  presented  is  definitely  easier  to 
understand  after  seeing  several  examples  than  it  is  to 
describe  formally.  The  user  should  become  familiar  enough 
with  the  notation  to  make  sense  of  it  by  comparing  examples  to 
the  description  in  the  above  described  notation  for  the  next 
few  pages. 

Zs.ls.1   IHI  FIND  STATEMENT 

The  FIND  statement  is  used  to  enter  search  reguests.  Its 
basic  form  is: 

FIND  <SEARCH  EXPRESSION> 
The  keyword  is  "FIND",  but  for  brevity  this  may  be  abbreviated 
"F".    The   parameter   is   a   combination  of  words  enclosed  in 
sinqle  quotes   for   which   the   search   is   to   be   conducted. 
Examples  are: 

FIND  'PRECISION* 

FIND  'PRECISION*  *  'RECALL* 
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F  'CATS'  *  'DOGS'  +  'MICE' 
In  the  above  examples,  the  first,  one  directs  EUREKA  to  find 
all  documents  in  which  the  word  "PRECISION"  appears.  The 
second  example  directs  EUREKA  to  find  all  documents  in  which 
both  the  word  "PRECISION"  and  the  word  "RECALL"  appear.  Note 
that  "*"  means  that  "Both  what  is  on  the  riqht  and  what  is  on 
the  left  must  appear".  The  final  example  directs  EUREKA  to 
find  all  documents  in  which  both  the  word  "CATS"  and  the  word 
"DOGS"  appear,  and  to  also  retrieve  all  documents  in  which  the 
word  "MICE"  appears.  Note  that  "♦"  means  that  "Either  what  is 
on  the  left  or  what  is  on  the  riqht  must  appear".  For  this 
reason,  "+"  is  referred  to  as  "OR",  while  "*"  is  referred  to 
as  "AND". 

Another  fact  to  note  about  the  final  example  is  that  the 
"AND"  is  considered  before  the  "OR".  That  is,  the  third 
example  is  taken  to  mean 

"All  documents  containing  either  the  word  "MICE"  or   both 

the  word  "CATS"  and  the  word  "DOGS" 
rather  than 

"Find  all  documents  containing  both  the  word   "CATS"   and 

either  of  the  words  "DOGS"  and  "MICE". 
A  little  perusal  will  show   the   reader   that   the   above   two 
sentences   do  not  mean  the  same  thing.   This  is  similar  to  the 
problem  in  algebra  of  deciding  whether 
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A  =  VX-5 
means 

A  =  U/(X-5) 
or 

A  =  (VX)-5 

Since  EUREKA  always  assumes  "AND"  is  to  be  done  before 
"OR",  if  one  wishes  to  tell  EUREKA  to  find  "all  the  documents 
containing  the  word  •CATS1  and  either  of  the  words  •DOGS1  or 
•fllCE*",  one  must  make  use  of  parentheses  to  alter  the  order 
of  evaluation  of  the  search  expression-  For  instance: 
F  •CATS1  *  ('DOGS*  ♦  'MICE') 

One  final  remark  about  search  expressions.  The  words  in 
single  guotes  are  called  "search  terms"  and  are  used  in  the 
exact  form  typed  in  when  searching  for  matches  in  the  document 
texts.   Therefore,  if  one  types  in 

FIND  •DOG1 
one  gets  the  list  of  documents  containing  the  word  "DOG",  but 
not  those  containing  the  words  "DOGGIES",  "DOGS", etc.  (unless 
the  word  "DOG"  appears  in  them  also) .  If  the  user  wishes  to 
search  for  all  forms  of  a  word  stem,  then  he  may  make  use  of 
the  "universal  character",  "#".   For  instance,  the  guery 

F  •DOG#« 
would  return  the  list  of  all  documents  containing  any   of   the 
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following  words:  "DOG",  "DOGS",  "DOGMATIC",  "DOGWOOD", 
...etc.  This  is  known  as  suffixing,  since  it  directs  EUREKA 
to  accept  any  suffix  attached  to  the  word  stem.  Prefixing  is 
also  permitted.   Fcr  instance, 

F  'tFIX' 
would  retrieve  all  documents  containing  any  of   the   following 

words:   "PREFIX",  "POSTFIX",  "SUFFIX", etc.   Both  prefixing 

and  suffixing  may  be  performed  at  once.   The  query 

F  •#IZ#I 
would   retrieve   all   documents   containing   such   words   as: 
"AMERICANIZATION",     "AMERICANIZED",    "COM  PUT ERIZ 2D" ,    etc. 
However,  if  the  universal  character  appears  in  the  middle  of  a 
word  it  is  assumed  to  actually  be  a  pound  sign.   Therefore, 

F  •AID1 
tries  to  find   the   word   "A#D"   rather   than   retrievinq   all 
documents   containing  a  word  starting  with  "A"  and  ending  with 
"D". 

Now  we  may  begin  to  add  on  optional  clauses  to  the  basic 
FIND  statement  in  order  to  simplify  certain  operations  and 
make  Dossible  others  that  are  not  possible  with  the  basic  FIND 
statement.  Options  are  just,  like  options  on  a  car  -  they  may 
be  included  if  necessary  or  left  out  if  not  needed. 
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2iI-.J.ilil  CONTEXT  CLAUSE 

The  first  optional  clause  we  shall  discuss  is  the  context 
option.   its  general  form  is: 

FIND  <Search  Expression>  IN  <Context> 
"In"  is  the  keyword  to  inform  EUREKA  that  a  search  context 
follows,  and  <Context>  is  the  parameter  that  specifies  the 
context  in  which  the  words  in  the  <Search  Expression>  must 
appear  in  order  for  the  document  to  be  retrieved.  For 
example,  the  query: 

FIND  'DOG1  *  'CAT1  IN  SENTENCE 
directs  EUREKA  to  retrieve  all  documents   in   which  both  the 
word   "CAT"   and   the   word   "DOG"  appear  in  the  same  sentence 
within  the  document. 

The  list  of  all  allowable  contexts  is: 


1 

:  SENTENCE 

2  : 

PARAGRAPH 

3  : 

DOCUMENT 

u  : 

:  ARTICLE 

5  : 

DATA 

6  : 

AUTHOR 

7  : 

:  TITLE 

8  : 

SOURCE 

9  : 

DATE 

10: 

:  PAGES 

11: 

MISC 

12: 

INDEX 

13: 

:  KEYS 

14: 

TEXT 

15: 

ABSTRACT 

16: 

:  BODY 

17: 

NOTES 

18: 

REFERENCES 

19: 

!  COMMENTS 
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Definitions  of  the  various  contexts  appear  in  Appendix  A. 
Note  that  any  context  term  may  dp  abbrevited  by  truncating  it 
to  any  Length  that  leaves  it  distinguishable  from  all  context 
terms  preceeding  it  in  the  above  list.  For  instance, 
specifying  "IN  A"  for  a  context  is  the  same  as  specifying  "IN 
ARTICLE",  while  "IN  AU"  specifies  "IN  AUTHOR".  Examples  of 
FIND  statements  containing  context,  clauses  are: 

F  'SALTON'  ♦  'LANCASTER*  IN  AUTHOR 
Which  directs  EUREKA  to  find  all  documents  written   by   either 
Salton  or  Lancaster. 

FIND  'GARBAGE'  IN  COMMENTS 
Which  directs  EUREKA  to  find  all  documents  to  which   the   user 
has   added   a   comment   (to  be  explained  later)  containing  the 
word  "GARBAGE" 

F  'COMPUTER*'  *  «LIBR#'  IN  TITLE 
which  directs  EUREKA  to  find  all  documents  that  contain  both  a 
word   starting   with   the   characters   "COMPUTER"   and   a  word 
starting  with  the  letters  "LIBR"  in  the  title. 

F  'AUTOMATA'  IN  AB 
which  directs  EUREKA  to  find  all  documents  containing  the  word 
"AUTOMATA"  in  their  abstract. 


28 

2r2rli.lr2  FROM  CLAUSE 

The  next  option  vie  shall  discuss  is  the  from  clause.  the 
from  clause  is  used  to  specify  the  search  set  (set  of 
documents  among  which  the  search  is  to  be  conducted.  See 
Section  2.U).   Its  general  form  is: 

FIND  <Search  Expr>  FROM  <From  Set> 
The  keyword  is  "FROM",  which  directs  EUREKA  to  search  for  the 
words  in  the  <Search  Expression>  only  among  the  documents  that 
meet  the  reguirements  of  the  set  expression  <From  Set>.  The 
parameter  <From  Set>  is  an  expression  involving  guery  sets  and 
documents.  Query  sets  may  be  referred  to  by  either  guery 
number  or  guery  set  name,  while  documents  are  referred  to  by  a 
list  of  document  numbers  separated  by  commas,  enclosed  by 
sguare  brackets.   The  general  form  for  a  <From  Set>  is: 

<Set  Term>  <Set  0p>  <Set  Term>  <Set  0p>. ...<Set  Term> 
where  <Set  Term>  is  either  a  guery  set   number,   a  guery  set 
name,   or  a  document  list  as  described  above.   <Set  0p>  is  one 
of  the  following: 

"*»,  "♦»,  or  "-". 
Since  the  concept  of  a  <From  Clause>  is  difficult  to  describe 
rigorously  in  English,  let  us  resort  to  some  examples. 

FIND  •ALPHA1  FROM  1 
directs  EUREKA  to  find  all  documents  that  responded  to   guery 
#1  and  also  contain  the  word  "ALPHA". 
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F    'ALPHA*     FROM     1    *    2 
directs    EUREKA    to   find    all    documents    that      responded      to      both 
query    #1    and   query    #2    and    also   contain    the    word    "ALPHA". 

F    'ALPHA'    FROM    1*3+2 
directs    EUREKA    to   find    all    documents    that    responded    to      either 
query      #2      or      to      both      query      #1       and      query    #3,    and    that    in 
addition,    contain    the    word    "ALPHA". 

FIND     'ALPHA'     FROM     1     -    2 
directs    EUREKA    to   find    all    documents   that    responded      to      query 
#1    but    did    not    respond    to   query    #2    ,    and    also   contain    the    word 
"ALPHA". 

F     • ALPHA' *' SOMETHING'     IN    SFNTENCE    FROM    1+[1,24,3] 
directs    EUREKA   to    find    all    documents    that    responded      to      query 
#1      and      contain    the    words    "ALPHA"    and    "SOMETHING"    in    the   same 
sentence,    and    to   also   search    documents    1,    24,    and      3      for      the 
occurrences    of    the    search    terms. 

FIND     'ALPHA'     +     'BETA'     FROM     1    -[3,19] 
which    directs    EUREKA   to    search      all      documents      respondinq      to 
query      #1      except      documents      #3      and    #19    for   an    occurrence    of 
either    the    word    "ALPHA"    or    the    word    "BETA". 

F    'ALPHA' 
directs    EUREKA   to   search    all    documents    in    the      data      base      for 
any    that    contain    the    word    "ALPHA". 

F    'ALPHA'     FROM    LAST 
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EUREKA  is  directed  to  search  for  documents  containing  the  word 
"ALPHA"  among  all  documents  responding  to  the  last  guery. 
I.E.,  if  this  is  guery  #4  then  all  documents  responding  to 
guery  #3  are  used  as  the  search  set  (just  as  if  "FROM  3"  had 
been  specified).  However,  if  this  happens  to  be  guery  #1, 
then  the  entire  data  base  is  searched  because  there  are  no 
preceeding  gueries  from  which  to  search. 

F  'ALPHA*  FROM  CHEZFAC  -  3 
This  directs  EUREKA  to  search  for  documents  containing  the 
word  "ALPHA"  among  all  the  documents  in  the  guery  set  named 
"CHEZFAC"  by  the  user,  except  any  documents  that  responded  to 
both  guery  #3  and  the  guery  named  "CHEZFAC"  by  the  user.  Let 
us  note  in  passing  that  "*"  is  used  as  a  Boolean  "AND" 
operator,  "♦"  is  used  as  a  Boolean  "OR"  operator,  and  "-"  is 
the  Boolean  "RELATIVE  COMPLEMENT".  Note  also  that  parentheses 
may  not  be  used  to  alter  the  order  of  evaluation  of  the  set 
expression.  If  one  wishes  to  have  a  complicated  expression  of 
set  name/numbers  not  obtainable  by  the  from  clause,  one  must 
use  the  MAKE  statement  (see  Section  2.7.3)  to  obtain  a  set 
eguivalent  to  the  desired  set  expression. 
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2-.Z-.lrIii  £MI£X  Ml  NAMING  CLAUSE 

Since  most  of  us  will  not  want  to  keep  track  of  large 
numbers  of  relatively  easy  to  forget  guery  numbers,  EUREKA 
allows  the  user  to  specify  a  mnemonic  name  for  any  guery  set 
he/she  creates.  One  method  of  assigning  a  set  name  is  via  the 
set  name  clause  attached  to  either  a  "FIND"  or  "MAKE" 
statement  (another  method  is  via  the  "CHANGE"  statement,  which 
will  be  described  later) . 

The  general  form  for  the  guery  set  name  clause  is: 
FIND  <Search  Expression>  =  <Setname> 
in  which  "="  is  the  keyword  that  signals  EUREKA  that  what 
follows  is  a  name  the  user  wishes  to  have  associated  with  the 
guery  set  that  will  result  from  this  FIND  statement. 
<Setname>  may  be  any  string  of  up  to  ten  letters  and/or 
numbers  (no  special  characters  like  ♦ , " ,  or  <)  that  meets  the 
following  restrictions: 

1)  Must  not  begin  with  a  number 

2)  Must  not  be  any  of  the  following  words: 

"ALL",  "FROM",  "COMMENTS",  "MACRO",  or  "LAST". 
Examples  of  find  statements  containing  set  naming  clauses  are: 

FIND  •DOG#»  FROM  3  =  DOGSET 
which  directs  EUREKA  to  search  all  documents  in  guery   set   #3 
for   words  that  begin  with  the  letters  "DOG"  and  then  name  the 
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resulting  query  set  "DOGSET". 

F  •ANORAK1  ♦  •CAGOULE1  IN  TITLE  FROM  ALL  =  RAINCOATS 
which  directs  EUREKA   to   search   the   entire   data   base   for 
documents   containing   either   the   word   "ANORAK"  or  the  word 
"CAGOULE"  and  then  name  the  resulting  query  set  "RAINCOATS". 

F  'CHEESE'  *  »FACTOR#»  IN  SENTENCE  =  CHEZFAC 
Now  turn  back  to  Section  2.3  and  study  the  example  there. 

2-.l2.li.lifi  COMMENTS  C LA  USE 

The  next  option  we  shall  discuss  is  the  comments  clause, 
which  is  used  for  attaching  user  comments  to  a  query  set. 
Comments  are  a  mechanism  for  writing  notes  to  oneself  that  may 
be  retrieved  at  a  later  time  via  the  PRINT  statement.  These 
comments  may  be  a  statement  of  the  purpose  of  creating  this 
particular  query  set,  the  number  of  documents  in  the  set,  or 
anything  else  the  user  feels  to  be  of  interest. 

The  general  form  for  the  comments  clause  of  the  FIND 
statement  is: 

FIND  <search  Expr>  "<Comment  StringV 
In  this  case  the  keyword  is   in   two   parts,   the  two   double 
guotes.   The  comment  string  itself  can  be  any  string  of  words, 
numbers,  etc.   up  to   256  characters   in   length.    The   only 
restriction  on  the  character  string  is  that,  since  EUREKA  uses 
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double  quotes  as  flags  for  finding  the  beginning  and  end  of 
the  comment,  if  the  user  wishes  to  have  double  quotes  appear 
in  his  comment  string  he  must  enter  two  double  quotes  side  by 
side. 

Examples  of  FIND  statements  with  comment  clauses  are: 
FIND  'CATS'  *  'DOG*1  =  ANIMALSET  "FIND  CATS  5  DOGS" 

FIND  «DOG#»  "TRY  OUT  ""UNIVERSAL  CHARACTER""" 

Note  that  if  the  user  requests  (via  the   PRINT   statement)   to 

have   the   comments   from   the  second  example  printed,  it  will 

appear  as: 

TRY  OUT  "UNIVERSAL  CHARACTER" 

Zs.ls.1   MCROS  AND  THE  DEFINE  STATEMENT 

He  must  now  explain  the  use  of  macros  so  that  references 
to  them  will  be  only  confusing  rather  than  incomprehensible  in 
the  descriptions  of  other  commands. 

In  order  to  avoid  forcing  the  user  to  type  in  long  search 
term  expressions  repeatedly  when  conducting  several  searches, 
each  of  which  contains  some  of  the  same  sub-expressions,  we 
allow  the  user  to  define  sub-expressions  of  search  terms  as 
macros.   An  example  of  a  find  statement  using  a  macro  is: 

FIND  'CATS*  *  BUGS 
where   "BUGS"   has   previously   been   declared   (by  a   DEFINE 
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statement)   to   be   'TICKS' ♦• FLEAS • .    This   FIND  statement  is 
equivalent  to: 

FIND  'CATS'  *  ('TICKS'  ♦  'FLEAS') 

Note  that  when  macros  are  used  in  FIND  search 
expressions,  they  are  not  delimited  by  single  quotes  as  are 
search  terms.  This  is  to  help  EUREKA  determine  whether  a  term 
is  actually  a  search  term  or  a  macro  that  must  be  expanded. 
Note  also  that  a  macro  text  is  enclosed  by  parentheses, 
thereby  possibly  altering  the  order  of  evaluation  of  the  term 
expression. 

Macros  are  declared  by  the  use  of  a  DEFINE  statement, 
which  has  the  following  format: 

DEFINE  <Search  Expression>  =  <Macro  Name> 
In  the  DEFINE  statement,  "DEFINE"  is  the  keyword  that  tells 
EUREKA  what  to  do  with  the  rest  of  the  command,  <Search 
Expression>  is  as  defined  for  the  "MAKE  statement"  (Section 
2.7.1).  The  "="  is  a  keyword  to  separate  the  search 
expression  from  the  name  the  user  wishes  to  assign  to  the 
macro  (<Macro  Name>) .  The  <Macro  Name>  must  follow  the  same 
rules  set  forth  for  the  <Set  Name>  (Section  2.7.1). 

Examples  of  DEFINE  statements  are: 
DEFINE  'TICKS'  ♦  'FLEAS'  =  BUGS 
which  defines  the  macro  used  in  the  macro  example  above. 
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DEF  »FLEAPOWDER«  *  BUGS  =  CORES 
Note  that  macros  may  be  used  within  definitions  of  new  macros. 
Also  note  that  the  keyword  "DEFINE"  may  be  abbreviated  "DEF". 

Is.ls.2   MEI  STATEMENT 

The  MAKE  statement  is  used  to  compare  two  or  more  query 
sets  and  generate  a  new  query  set  based  on  the  results  of  the 
comparison.   The  basic  form  of  the  MAKE  statement  is: 

MAKE  <Set  Expr> 
where  "MAKE"  is  the  keyword  (which  may  be  abbreviated  "M"), 
and  <Set  Expr>  is  the  same  as  the  <From  Set>  defined  for  the 
FIND  statement  (see  Section  2.7.1).  This  is  because  the  MAKE 
statement  is,  in  effect,  an  explicit  method  of  creating  new 
sets  of  documents  from  old  sets  and  explicit  document  numbers. 
The  difference  between  <From  Sets>  created  by  a  from  clause  of 
a  FIND  statement  and  a  query  set  created  by  a  MAKE  statement 
is  that  a  <From  Set>  is  temporary  only  and  may  not  be  referred 
to  again  without  explicitly  re-creating  it,  while  a  query  set 
created  by  a  MAKE  statement  is  given  exactly  the  same  status 
as  a  guery  set  created  by  a  FIND  statement.  It  may  be  named, 
have  comments  attached  to  it,  and  it  may  be  referred  to  by 
guery  number  or  query  name  in  a  later  query. 
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Examples  of  basic  MAKE  statements  are: 

MAKE  3*2 
which  directs  EUREKA  to  create  a  new  query  set   consisting   of 
all   documents   that   appear  in  both  query  set  #3  and  in  query 
set  #2. 

MAKE  3  «•  2  *  CHEZFAC 
which  directs  EUREKA  to  create  a  new  query  set  composed  of  all 
documents  that  are  either  in  query  set  #3  or  in  both  query  set 
#2  and  the  query  set  named  "CHEZFAC"  by  the  user. 

MAKE  3  ♦  [7,26,8] 
which  directs  EUREKA  to  create  a  new   query  set   composed   of 
documents   7,   26,   and   8,  and  also  all  documents  that  are  in 
query  set  #3, 

The  options  for  the  MAKE  statement  are  the  query  naming 
clause  and  the  comments  clause.  Both  of  these  options 
function  exactly  like  their  counterparts  for  the  Find 
statement,  so  the  reader  is  directed  to  Section  2-7.1  if 
further  description  is  required.  The  general  form  of  the  MAKE 
statement  is: 

MAKE  <Set  Expression>  =  <Setname>  "<Comment  StringV 
Examples  of  MAKE  statements  are: 

MAKE  3   ♦   FINALSET   =   NEWSET   "CREATE   NEW   SET   FROM 

FINALSET  S  3" 

M    3    *    CHEZFAC    ♦    [7,13]    "NO    SET    NAME" 
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H  ROADSET  -  [11,7)  =  NEWROADSET  "DELETE  DOCS  1157   FROM 
POADSET" 

2.7.4  CHANGE  STATEMENT 


The  CHANGE  statement  is  used  to  assign  a  name  to  a  query 
set  or  to  change  an  existing  name  of  a  query  set.  Its  general 
form  is: 

CHANGE  <Query  Set  ID>  TO  <Query  Set  Name> 
"CHANGE"  is  the  keyword,  which  may  he  abbreviated  "CH",  that 
informs  EUREKA  that  a  name  assiqnment/change  follows.  "TO"  is 
a  keyword  to  separate  the  old  set  ID  from  the  new.  <Query  Set 
ID>  is  either  a  query  number  or  a  query  set  name.  <Query  Set 
Name>  is  the  new  query  set  name  to  be  assigned  to  the  guery 
set  identified  in  <Query  Set  ID>.  This  name  must  obey  the 
rules  described  for  query  set  naming  in  Section  2.7.1. 
Examples  of  CHANGE  statements  are: 

CHANGE  3  TO  GOODSET 
which  assigns  the  name  "GOODSET"  to  guery  set  #3. 

CH  GOODSET  TO  BADSET 
Which  changes  the   name   of   the   guery   set   currently   named 
"GOODSET"  to  "BADSET". 
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Macros  may  be  renamed  by  following  the  word   "CHANGE"   by 
the  word  "MACRO".   An  example  is: 
CHANGE  MACRO  TX34J  TO  FRED 
which  changes  the  name  of  a  macro   named   "TX3UJ"   to   "FRED". 
Note  that  the  keyword  "MACRO"  may  not  be  abbreviated. 

lili.5  COMMENT  STATEMENT 

The  COMMENT  statement  is  used  to  assign  comments  to  guery 
sets  or  individual  documents.  These  comments  may  be  retrieved 
upon  demand  by  use  of  the  PRINT  statement.  The  general  form 
of  the  COMMENT  statement  is: 

COMMENT  <Set/Document  ID>  "COMMENTS" 
"COMMENT"  is  the  keyword  (which  may  be  abbreviated  "CO")  to 
inform  EUREKA  that  what  follows  is  a  set  or  document 
identifier  for  the  set  the  user  wishes  to  add  a  comment  to, 
and  the  double  guotes  act  as  delimiters  for  the  comment 
string.  The  <Set/Doc  ID>  must  be  either  a  guery  number,  a 
guery  set  name,  or  a  document  number  enclosed  in  sguare 
brackets  ([  ]) .  The  comment  string  must  follow  the  rules 
described  in  Section  2.7.1.  Examples  of  COMMENT  statements 
are: 

COMMENT  3  "SOME  COMMENT  STRING" 

CO  [  19]  "VERY  GOOD  PAPER  ON  ""FRABBLEGIBBETS""" 
The  first  example  merely  attaches  the  comment 
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SOME  COMMENT  STBING 
to  query  set  #3,  while  the  second  attaches  the  comment  string 

VERY  GOOD  PAPER  ON  "FR ABBLEGIBBETS" 
to  document  #19. 

2iZii>  DELETE  STATEMENT 

The  DELETE  statement  is  used  to  delete  query  sets  and/or 
comments  from  the  user  file  area.  Once  a  query  set  has  been 
deleted  it  cannot  be  referred  to  in  a  MAKE  statement  or  a  From 
Clause,  but  it  no  longer  takes  up  space  in  the  users1  file. 
Since  each  user  is  assiqned  only  one  cylinder  of  disk,  it  is 
important  to  remove  unwanted  query  sets  and  comments  when  they 
are  no  longer  needed. 

The  DELETE  statement  has  three  forms.  The  first  looks 
like  this: 

DELETE  <Set  List> 
"DELETE"  is  the  keyword,  and  may  be  abbreviated  "DEL".  The 
<Set  List>  is  a  list  of  query  set  names  and  query  set  numbers 
of  query  sets  that  the  user  wishes  to  have  deleted.  This  will 
remove  both  the  query  set  and  all  associated  comments. 
Examples  are: 

DELETE  3,7,JONKSET 

DEL  BADSET 
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The  second  form  is: 

DELETE  COMMENTS  <Set/Doc  List> 
where  "DELETE  COMMENTS"  is  the  keyword  informing  EUREKA  to 
remove  all  user  comments  attached  to  the  query  sets  and/or 
documents  that  make  up  the  <Set/Doc  List>.  The  keyword  may  be 
abbreviated  "DEL  COMMENTS",  The  query  sets  in  the  list  may  be 
referred  to  by  either  query  number  or  by  query  set  name,  while 
the  documents  must  be  referred  to  by  a  list  of  document 
numbers  separated  by  commas  and  enclosed  in  a  sinqle  set  of 
square  brackets. 

When  this  form  of  the  DELETE  statement  is  used  only  the 
comments  attached  to  a  query  set  or  document  are  deleted,  so 
the  query  sets  may  be  referred  to  in  later  statements  (but  the 
comments  are  no  lonqer  available) •   Examples  are: 

DELETE  COMMENTS  3 ,[  1 5,23,5  ],GOODSET 

DEL  COMMENTS  [ 18] 
Note  that  users  may  delete  comments  from   documents,   but  are 
not  allowed  to  delete  the  actual  documents. 

The  third  form  of  the  DELETE  statement  is: 
DELETE  MACRO  <MACRO  LIST> 
As  one  would  expect,  this  form   is   used   for   deleting   macro 
definitions.   "MACRO"  is  the  keyword  informinq  EUREKA  that  the 
following  list  is  a  list  of  macro  definitions  to  be  deleted 
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rather   than   a   list   of  query  sets  to  be  deleted.   Note  that 
this  command  may  be  abbreviated  to  "DEL  MACRO  <Macro  List>". 

If  the  user  wishes  to  delete  all  or  most  of  his/her 
macros,  query  sets,  and/or  comments,  he  may  specify  that  he 
wishes  EUREKA  to  delete  all  sets/macros  except  the  ones  he 
wishes  to  save  by  puttinq  the  names  and/or  numbers  of  the 
queries/macros  he  wishes  to  have  saved  in  the  <Set 
List>/<Macro  List>  and  preceed  the  <Set  List>/<Macro  List> 
with  the  keywords  "ALL  EXCEPT". 

Similarly,  "DELETE  ALL",  "DELETE  COMMENTS  ALL",  AND 
"DELETE  MACRO  ALL"  delete  all  query  sets,  comments,  and  macros 
respectively,  with  no  exceptions. 

Examples  of  the  use  of  "ALL"  are: 

DELETE  ALL  EXCEPT  5 
which  deletes  all  query  sets  and  comments  except  query  set  #5. 

DEL  COMMENTS  ALL 
Which  deletes  all  user  assiqned  comments. 

DELETE  MACRO  ALL  EXCEPT  DIRMAC 
which  deletes  all   macro   definitions   except   the  one   named 
"BIGMAC". 
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li.ls-1   .LOGON  STATEMENT 

The  LOGON  statement  is  used  to  identify  the  user  to  the 
EUREKA  system  so  that  it  can  retrieve  the  user  files  and 
initialize  a  workspace  for  the  user.  The  form  of  the  LOGON 
statement  is: 

LOGON  <User  ID> 
where  "LOGON"  is  the  keyword  (it  may  not  be  abbreviated) , 
<User  ID>  is  the  (up  to)  six  letter  identification  code 
assigned  to  each  user.  If  a  person  does  not  have  a  user  ID, 
he  may  still  use  all  facilities  of  the  system  except  the 
storaqe  of  results  between  terminal  sessions.  If  a  user 
enters  a  query  without  first  typing  in  a  "LOGON"  command,  he 
is  automatically  logged  in  as  a  public  user  and  allowed  full 
access  to  the  system.  However,  as  soon  as  the  public  user 
logs  off  the  system  all  of  his  query  sets  and  comments  are 
erased  in  order  that  the  next  public  user  may  start  with  a 
clean  slate.   Examples  are: 

LOGON  A3UKR7 

LOGON  FRED 
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2-.Z-.8  LOGOFF  STATEMENT 

The  LOGOFF  statement  is  used  to  log  out  from  the  system 
after  a  session.  It  tells  EUREKA  to  save  all  the  user  files 
and  free  the  user's  workspace.   Its  form  is: 

LOGOFF 
and  there  are  no  variations  on  its  form. 

lils.1    PlINT  STATEMENT 

The  PBINT  statement  has  three  uses.  it  may  be  used  to 
print  all  or  part  of  any  document.  It  may  also  be  used  to 
print  information  about  previous  queries  and  their  related 
query  sets.   Another  use  is  to  print  macro  definitions. 

The  form  of  the  PRINT  statement  used  for  printing  query 
set/statement  information  is: 

PRINT  <Set  ID  1>  TO  <Set  ID  2> 
where  <Set  ID  1>  and  <Set  ID  2>  are  either  query  numbers  or 
query  set  names.  This  will  cause  the  followinq  information  to 
be  printed  for  each  query  set  with  a  query  number  between  that 
of  <Set  ID  1>  and  <Set  ID  2>:  query  number,  query  set  name 
(if  present),  query  text,  list  of  all  documents  making  up  this 
guery  set  and  their  relative  rank,  and  any  comments  associated 
with  this  query  set.  If  "TO  <Set  ID  2>"  is  omitted,  only  the 
information   for   the   set  specified  by  <Set  ID  1>  is  printed. 
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Examples  are: 

PRINT  3  TO  LAST 
PRINT  JOE 

and 
PRINT  3 

The  form  of  the  PRINT  statement  used  for  printing  all   or 
part  of  documents  is: 

PRINT  <Context  List>  FROM  <Set/Doc  ID> 
where  <Context  List>  is  the  list  of  context  items  that  the 
user  wishes  to  have  printed.  Any  list  of  context  terms  is 
valid  here,  as  long  as  they  are  meaningful.  If  "PARAGRAPH"  or 
"SENTENCE"  is  specified  here,  EUREKA  looks  at  the  query 
statement  that  generated  the  set  list  from  which  we  wish  to 
print  information  and  then  prints  all  paragraphs  or  sentences 
containing  the  search  terras  specified  by  the  query  statement. 
Therefore,  it  is  not  meaningful  to  command  EUREKA  to  print  a 
sentence  from  a  set  created  by  a  MAKE  statement,  since  there 
are  no  search  terms  in  the  MAKE  statement  to  search  for. 


If  no  <Context  List>  is  specified,  the  default  value 
assumed  is  "DOCUMENT".  <Set/Doc  ID>  may  be  either  of  the 
following: 

1)  Set  name  or  query  number; 

2)  List  of  document  numbers  separated  by  commas   and   enclosed 
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by  square  brackets  ("["  and  "  ]")  • 

The  command: 

"PRINT  <Context  List>  FROM  <Set  Name/Query  #>" 
will  cause  the  portions  of  documents  specified  by  the  <Context 
List>   of   each   document   in  the  set  list  of  that  query  to  be 
printed. 

The  command: 

"PRINT  <Context  List>  FROM  [ Doc#1 ,Doc#2 ,.  .  . , Doc#N ] 
causes  the  specified   (by   the   <Context   List>)   portions   of 
documents   numbered   "Doc#1",  "Doc#2,...,  etc.   up  to  Doc#N  to 
be  printed. 

All  output  from  a  PRINT  statement  is  routed  to  the  users 
terminal,  unless  he  ends  the  print  statement  with  "ON  LP"  ,  in 
which  case  all  output  is  routed  to  the  line  printer. 

Examples  are: 

PRINT  JOE 
which  prints  all  information  (as  described   above)   about   the 
query  and  query  set  named  "JOE". 

PRINT  JOE  TO  14 
which  prints  the  query  set  information   for  every  query   set 
with   a  query  number  between  14  and  that  of  set  "JOE"  (whether 
"JOE"  has  a  lower  or  hiqher  number  than  14). 
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PRINT  FROM  JOE  ON  LP 
which  prints  the  entire  document  text   of   every   document   in 
query  set  "JOE"  on  the  line  printer.   ***  WARNING!   BE  CAREFUL 
WHEN  USING  THIS  COMMAND,  AS  IT   CAN   EASILY   GENERATE   IMMENSE 
AMOUNTS  OF  OUTPUT  ***. 

PRINT  TITLE  FROM  3 
which  prints  the  title  of  every  document  in  query  set  #3. 

PRINT  TITLE, AUTHOR  FROM  [3] 
which  prints  the  title  and  author  of  document  number  3. 

P  FROM  [  3,43,22  ]  ON  LP 
which  directs  EUREKA  to  print  documents  3,43,  and   22   on   the 
line  printer.   Notice  that  "PRINT"  may  be  abbreviated  by  "P". 

P  SEN  FROM  NEWSET 
which  prints  every  sentence  in  each  document  in  the  query   set 
"NEWSET"   that   contains   a  term  from  the  Terra  Expression  from 
the  FIND  statement  that  qenerated  "NEWSET". 

PRINT  COMMENTS  FROM  [12] 
which  prints  all  user-assiqned  comments  attached   to   document 
#12. 

P  COMMENTS  FROM  12 
which  prints   all   comments   the   user   has  attached   tc  any 
document  in  query  set  #12. 
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The  third  use  of  the  PRINT  statement   is   printing   macro 
definitions.   The  form  used  is: 

PRINT  MACRO  <Macro  Name> 
"PRTNT  MACRO"  is  the  keyword  specif yinq  that  the  following 
word  is  to  be  taken  to  be  the  name  of  a  user  defined  macro 
definition  and  that  this  macro  definition  is  to  be  retrieved 
from  the  user  file  and  printed.  If  the  word  "All"  is 
substituted  for  <Macro  Name>  then  all  macros  and  their 
definitions  are  listed.  The  output  may  be  routed  to  the  line 
printer  by  adding  "ON  LP"  to  the  end  of  the  command.  Some 
examples  are: 

PRINT  MACRO  BUGS 
which  causes  the  definition  of  the  macro  "BUGS"   (see   Section 
2.7.2)  to  be  printed  on  the  terminal. 

P  MACRO  BIGMAC  ON  LP 
which  causes  the  macro   named   "BIGMAC"   by   the   user   to   be 
printed  on  the  line  printer. 

P  MACRO  ALL 
which  prints  out  all  macro  names  and  the  macro  text  associated 
with  each  macro  name. 
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SYSTEM  PROGRAMMERS  GUIDE 

1^1    AN  OVERVIEW  OF  EUREKA 

In  order  to  obtain  an  overview  of   EUREKA   before   being   faced 
with   the  qory  details,  let  us  examine  a  block  diagram  of  the  system 
structure  (Fig.   3.1.1).   This  diagram  shows  the  structure  of  EUREKA 
at   a   task   level.    The   block   labelled  "Processor"  is  the  actual 
PDP-11  hardware,  which  is  allocated  and  controlled  by  "DOS",  the  DEC 
operating   system,   and  by  "EXECUTIVE",  the  EUREKA  operating  system. 
"DD"  and  "DE"  are  the  two  Diva  231U-style   disk   drives   where   both 
system  and  user  files  reside.   "Userl"  and  "User2"  are  the  terminals 
and  associated   non-deterministic,   non-rational   physical   cellular 
automata   (1).    Neither   DOS   nor   EXECUTIVE   shall  be  explained  in 
detail  here,  as  sufficent  documentation  exists  elsewhere  [1,2].   One 
point    we   should   consider   before   proceeding,   however,   is   the 
interrelationship  of  the  two  operating  systems.   DOS,   the   standard 
DEC  operating  system,  is  used  primarily  as  a  bootstrap  and  low-level 
software  resource  by  EXECUTIVE,  which  actually  provides   almost   all 
of  the  multi-user  scheduling,  allocation,  and  management  facilities. 
All  I/O  reguests,  task  startup  and  control,   and   memory   management 


(1)  sometimes  referred  to  as  "humans". 
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are  done  by  traps  to  EXECUTIVE. 

Each  module  is  actually  an  invocation  of  a  collection  of  one  or 
more  object  modules  that  are  logicaly  grouped  together  and  may  be 
treated  as  a  single  unit  by  the  executive  for  purposes  of  memory 
management  and  process  control.  Within  tasks,  JSR*s  and  JMP*s  may 
be  used  to  transfer  control  between  modules,  but  between  tasks,  the 
EXECUTIVE  trap  $PRFRM  must  be  used  in  order  to  maintain  EXECUTIVE'S 
task  control.  Modules  within  a  task  share  a  common  memory  area, 
known  as  the  "workspace",  which  must  be  allocated  by  the  EXECUTIVE 
trap  SALOC.  This  memory  area  (and  any  other  memory  allocated  by  a 
task)  may  be  accessed  by  any  routine  in  the  task  and  by  any  routines 
in  tasks  initiated  via  $PFFM  traps  by  the  task,  but  by  no  others. 

The  Root  Task  or  Root  Node   (ROOT)   is  an   initialization  and 

housekeeping  task  used  to  initialize  user  and  system  files,  etc.  It 

is  the  first  module  to  be  performed  by  EXECUTIVE   upon   starting  up 
the  system. 

User  Interface  (USRNTF)  is  the  window  between  the  users  and  the 
internal  EUREKA  routines.  It  acts  as  a  terminal  handler  and  message 
router  by  accepting  and  formating  commands  from  the  user,  performing 
the  Parser,  and  then  displaying  any  error  messages  generated  by 
lower-level  routines  and/or  prompting  the  user  for  his  next  command. 
There  is  one  invocation  of  User  Interface  in  existence  for  each  user 
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in  the  system  at  any  given  tine.  The  Root  Task  starts  up  one 
invocation  of  the  User  Interface  for  each  terminal  attached  to  the 
system  (this  is  determined  by  assembled-in  constants  in  the  code)  at 
the  time  EUREKA  is  initialized.  This  invocation  remains  in 
existance  for  the  duration  of  the  execution  (at  least,  until 
"SHUTDOWN"  is  typed  in  to  shut  EUREKA  down).  All  EUREKA  routines 
(except  initialization  routines)  access  user-dependent  structures  by 
way  of  pointers  and  tables  passed  to  them  by  higher  level  routines. 
This  allows  EUREKA  routines  to  be  repntrant,  simplifying  greatly  the 
process  of  adding  more  users  to  the  system  anl  minimizing  memory 
usage  since  only  one  copy  of  the  code  need  exist  to  serve  all  users. 

The  Parser  (PARSER)  decodes  user  commands  by  examining  the 
command  string  typed  in  by  the  user.  It  creates  either  a  Search 
Supervisor  Table  or  a  Set  Handler  Table  that  describes  the  services 
reguested  by  the  user  and  contains  all  the  information  needed  by 
lower  level  routines  to  perform  these  services.  The  Parser  then 
performs  the  correct  action  routine.  If  the  command  parsed  requires 
either  full-text  searching  or  set  list  merging  (FIND,  MAKE,  or  PRINT 
<Context>  FROM)  Search  Supervisor  is  performed  upon  completion  of 
the  parse.  If  the  command  parsed  was  PRINT  <Set  ID>,  then  the  Set 
Information  Printer  (INFOPT)  is  performed.  LOGON  and  LOGOFF  are 
handled  internally  by  the  Parser.  All  other  commands  (CHANGE, 
DEFINE,   COMMENT,  and  DELETE)  cause  the  Set  Handler  to  be  performed. 
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Upon  completion  of  the  action  routine,  control  is   returned   to  the 
Parser,  which  immediately  returns  control  to  the  User  Interface. 

The   Search   Supervisor   (SRCHSP)  is   primarily   a   sequencing 

routine  that  controls  the  operation  of  the  EUREKA  routines  used  in  a 

search.   The  Merger  (MERGE),  Index  and  Postings   Handler   (IPHNDL), 

Full   Text   Searcher  (FTSRCH) ,  and  Set  Handler  (SETHLR)  are  all  used 

by  the  Search  Supervisor  to  complete  a  search. 

The  Merger  (MERGE)  is  used  to  merge  lists  of  documents  together 
in  order  to  construct  lists  of  documents  meeting  the  conditions  of  a 
Boolean  function  specified  by  the  user  in  his/her  search  command. 
It  is  hoped  that  eventually  the  Merger  will  become  a  manager  for  a 
hardware  merge  unit  now  under  construction. 

The  Index  and  Posting  Handler  (IPHNDL)  is  used  to  evaluate  one 
term  at  a  time.  It  is  given  one  search  term  by  the  Search 
Supervisor  and  produces  a  list  of  documents  in  which  this  terra 
appears.  Note  that  in  the  case  of  a  term  containing  several  tokens, 
i.e.  *FULL  TEXT',  the  Index  and  Postings  Handler  returns  a  list  of 
documents  containing  both  the  words  "FULL"  and  "TEXT"  with  no 
assurance  that  the  string  "FULL  TEXT"  actually  appears.  In  this 
case,  the  Index  and  Postings  Handler  marks  each  document  in  the  list 
by  setting  the  full-text  search  bit  in  its  descriptor  (described  in 
Sec.   3.8);   The  Full  Text  Searcher  must  be  used  to  determine  if  the 
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words      actually      appear      in     the      correct      relationship      within      the 
documents  listed    by   the    Index    and   Postings   Handler. 

The  Full  Text  Searcher  (FTSRCH)  performs  the  actual  comparison 
of  search  strings  from  the  user's  command  to  the  text  of  documents 
whenever   a    full-text   search    must   be  performed. 

The  Set  Handler  (SETHLR)  performs  all  maintenance  and  accession 
of  the  user's  personal  files.  All  entries,  deletions,  and  reads 
to/from    this   file   must    be   done    through   this   routine. 

The  Set  Information  Printer  (INFOPT)  is  used  to  retrieve 
information  on  previous  gueries  and/or  macros  typed  in  by  the  user. 
It  uses  the  Set  Handler  to  retrieve  the  desired  information  from  the 
user's  personal  file,  formats  it,  and  then  displays  it  on  either  the 
user's    terminal    or   the    line    printer. 

Now  let  us  take  a  guick  look  at  the  information  structures 
manipulated  by  EUREKA.  There  are  effectively  four  types  of 
information  (excluding  EXECUTIVE  data)  dealt  with  by  EUREKA:  the 
user's  command  string;  the  documents  in  the  database  and  their 
associated  accession  mechanism;  the  user's  Logon  Block  and  personal 
file;  and  command  tables  passed  from  one  task  to  another.  The 
user's  command  string  is  entered  by  the  user  through  the  keyboard 
and      is      passed,      along    with    a   pointer   to   the   user's  Logon  Block,    to 
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the  Command  Parser.  The  Logon  Block  is  effectively  EUREKA's  record 
of  all  system  information  specific  to  one  user.  The  Command  Parser 
then  builds  either  a  Search  Supervisor  Table  or  a  Set  Handler  Table, 
depending  on  which  action  task  is  to  be  performed.  If  the  command 
is  MAKE,  FIND,  or  PRINT  FROM,  then  a  Search  Supervisor  Table  is 
constructed;  otherwise  a  Set  Handler  Table  is  constructed.  The 
table  is  then  passed  to  the  correct  action  routine  (again,  along 
with  a  pointer  to  the  user's  Logon  Block).  The  action  routines  use 
the  tables  passed  to  them  to  determine  what  actions  are  to  be 
performed  (i.e.  read  a  set  list,  search  on  a  list  of  terms,  etc.) 
and  the  Logon  Block  to  find  the  correct  user  file  to  use  and  other 
such  user-specific  data.  The  document  file  and  its  associate-.! 
accession  mechanism  (including  the  index,  postings,  and  hash  files) 
is  used  to  perform  the  actual  searches  and  to  display  text  on  the 
user's  terminal.  The  user's  personal  file  is  used  to  store  the 
record  of  his/her  past  searches,  along  with  any  macro  definitions  or 
comments  attached  to  sets  or  documents  by  the  user. 

All   the   above   tasks  and   information   structures,  will    be 
described  in  greater  detail  in  the  following  sections. 
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LlI   I0.0.!  1.ASK 

The  first  module  we  shall  consider  is  the   Root   Task   or  Root 

Node   (ROOT),    This  is   a   relatively   uncomplicated   module  that 

essentially  gets  things  started  for  the  rest  of  the  system  and  then 
closes  up  shop  when  the  system  is  shut  down. 

The  Root  Task  starts  the  system  up  by: 

1)  Calling  (via  a  JSR)  subroutine  IRINIT,  which  opens  all  files 
except  the  user  files,  .INlTs  the  terminals,  and  sets  up  some 
non-relocatable  scratch  spaces; 

2)  Performing  n  copies  of  the  User  Interface,  where  n  is  an 
assembled-in  constant  with  global  name  LGNUM,  passing  each  the 
address  of  a  different  Logon  Block  The  layout  of  the  Logon  Block  is 
shown  in  Fig.   3.2.1; 

3)  Doing  a  TRAP  5  to  wait  until  all  n  copies  of  the  User  Interface 
have  executed  $RETN  traps;  i.e.  SHUTDOWN  has  been  typed  in  at  all 
terminals) . 

Once  all  copies  of  the  User  Interface  have  died,  the  Root  Task 
closes  all  relevant  files  and  then  executes  a  $RETN  trap  to  return 
control  to  the  EXECUTIVE,  thus  shutting  down  the  system. 
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i*-l  H§ER  INTERFACE 

The  User  Interface  (USRNTF)  is  the  user's  window  into  the 
system.  Each  terminal  has  one  invocation  (task)  of  the  User 
Interface  associated  with  it  (via  the  Logon  Block  passed  to  the  task 
at  the  time  it  was  initiated  via  a  $PRFM  trap  by  the  Root  Task)  . 
This  User  Interface  task  handles  query  prompting,  formating.  Parser 
invocation,  error  message  display,  and  statistics  recording  for  the 
user  at  its  associated  terminal  and  does  not  die  until  "SHUTDOWN"  is 
typed  in  at  the  terminal. 

The  first  action  performed  by  the  User  Interface  is  the  setting 
up  of  a  buffer/statistics  area.  One  component  of  this  buffer  is  the 
text  buffer  that  is  passed  to  the  Command  Parser  (and  hence  the  rest 
of  the  system) .  Another  large  section  of  the  buffer  is  the 
statistics  area  in  which  all  timing  and  frequency  statistics  are 
recorded.  Double  buffering  is  used  for  the  statistics  area  in  order 
to  decrease  the  number  of  I/O  requests  made  during  operation  of  the 
system.  Once  the  buffers  have  been  initialized,  the  User  Interface 
enters  a  loop  in  which  it: 

1)  Clears  and  resets  the  least  recently  used  statistics  block; 

2)  Re-initializes  the  byte  count  in  the  terminal  I/O  buffer   header; 

3)  Reads   the   next  query  typed  in  by  the  user  into  the  text  buffer; 
H)    Checks  to  see  if  the  guery  was  too  long; 
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5)  checks  for  continuation  to  another  line,  loops  back  to  (2)  if  so; 

6)  Checks  for  "SHUTDOWN"  having  been  typed  in ,  goes  to  shutdown 
routine  if  so; 

7)  Performs  the  Parser,  passing  it  pointers  to  the  Logon  Block  and 
the  guery  text  string; 

8)  Displays  error  message  (if  any) ,  or  "COMMAND  COMPLETE"  message  if 
no  error  has  occured  when  control  is  returned  from  the  Command 
Parser ; 

9)  Records  statistics  into  buffer  and  writes  block  containing  both 
buffers  if  both  buffers  are  full; 

10)  Loops  back  to  (2) . 

The  shutdown  routine  writes  the  user  statistics  block  out  to 
disk  (one  buffer  may  be  empty,  depending  on  whether  the  user  type! 
in  an  even  or  odd  number  of  gueries) ,  and  then  executes  a  SRETN 
trap,  returning  control  to  the  Root  Task. 

lii  COMMAND  PARSERS 

The  Command  Parser  module  consists  of  a  collection  cf  different 
routines  that  each  parse  one  EUREKA  command,  plus  several 
subroutines  used  to  perform  common  functions.  Access  to  the  Command 
Parser  module  is  through  routine  FIND,  which  parses  the  FIND, 
DEFINE,  LOGON,  and   LOGOFF  statements.    FIND   initially   allocates 
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workspace  for  the  entire  Command  Parser  module  (placing  the  address 
in  R5) ,  fills  in  the  address  of  the  beginning  and  end  of  the  query 
text  in  the  workspace,  and  then  does  a  JMP  to  the  correct  Command 
Parser  routine  based  on  the  first  two  or  three  letters  of  the 
command. 

The  individual  Command  Parser  routines  shall  not  be  described 
here  since  they  are  very  straightforward  linear-scan, 
f ill-in-the-table  routines. 

Once  the  Comand  Parser  routine  has  constructed  a  command  table 
of  the  appropriate  type  it  then  initiates  via  a  $PRFM  trap  either 
the  Set  Handler,  Search  Supervisor,  or  Set  Information  Printer, 
depending  on  instruction  type. 

As  soon  as  the  action  routines  execute  $RETN  traps,  returning 
control  to  the  Command  Parser,  the  Command  Parser  executes  a  $RETN 
trap,  returning  control  to  the  User  Interface  in  order  to  begin  the 
cycle  again. 

ls.1    SET  HANDLIE 

The  next  task  we  shall  consider  is  the  Set  Handler  (SETHLR) , 
which  maintains  the  users1  personal  files.  All  changes  to  the  user 
file  or  retrievals  therefrom  must  be  made  by  this  task  in   order   to 
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maintain  the  integrity  of  the  user  file  structure. 

2*.5±1    USER  fliil  STRUCTURE 

Each  user  in  the  EUREKA  system  is  assigned  a  personal  disk  file 
in  which  all  user-specific  records  are  stored.  Since  user 
information  is  dynamic  and  of  varying  lengths  and  types,  we  must 
have  an  access/storage  system  that  can  cope  efficently  with  rapidly 
changing,  non-homogenious  data.  This  system  should  also  seek  to 
minimize  the  number  of  disk  accesses  reguired  by  common  functions, 
as  they  are  currently  one  of  the  more  troublesome  bottlenecks  in 
EUREKA. 

The  data  structure  chosen  for  this  task  is  shown  in  Fig. 
3.5.1.1.  It  exists  in  the  medium  of  a  disk  file  consisting  of  one 
cylinder  of  disk.  This  gives  us  240  contiguous  blocks  of  256  16-bit 
words.  Since  the  blocks  are  allocated  in  one  cylinder  they  may  all 
be  accessed  without  moving  the  heads  of  the  disk  unit,  thus  avoiding 
some  seek  time  on  sequential  reads.  The  file  is  accessed  in 
relative  (.BLOCK)  mode,  giving  us  a  block  address  space  of  0-239  and 
a  byte  address  space  of  0-511.  User  information  is  stored  in  blocks 
7-239,  with  blocks  0-6  being  used  as  directory  space.  The  storage 
blocks  (7-239)  are  divided  (for  allocation  purposes)  into  chunks  of 
64  bytes  (8  per  block).   File  space  is  allocated  in  chunks   starting 
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at  the  last  block  (239)  of  the  file  and  qrows  toward  the  front  of 
the  file.  Chunks  within  each  block  are  allocated  from  byte  443 
proceedinq  back  to  byte  0.  Since  Set  Handler  routines  requesting 
disk  space  from  the  Bitmap  Handler  are  only  qiven  a  startinq  address 
which  is  actually  the  lowest  byte  of  the  lowest  block  number  of  the 
contiquous  space  allocated  them,  only  the  Bitmap  Handler  (ALOCD) 
need  be  concerned  with  this  allocation  pattern.  The  Bitmap  Handler 
keeps  track  of  which  chunks  of  disk  are  in  use  by  recordinq  their 
status  in  a  bitmap  which  occupies  the  last  240  bytes  of  the  user's 
Loqon  Block  when  the  user  is  loqqed  in  or  the  first  240  bytes 
(0-239)  of  block  0  of  the  users  disk  file  when  loqqed  out. 

In  this  bitmap,  a  bit  value  of  "0M  implies  the  corresponding 
chunk  is  in  use,  while  a  bit  value  of  "1"  implies  that  the  chunk  is 
free.  The  mappinq  scheme  that  allows  us  to  associate  bits  with 
chunks  will  be  discussed  alonq  with  the  Bitmap  Handler  in  Sec. 
3.5.8. 

The  rest  of  the  first  block  (block  0)  of  the  user's  file  is 
occupied  by  a  one-byte  (byte  240)  free  directory  block  number,  a 
one-byte  (byte  241)  valid/invalid  flaq  for  the  bitmap,  and  the 
user's  macro  directory  in  words  121-253  (bytes  242-507).  See  Piqure 
3.5.1.2  for  details.  The  free  directory  block  number  is  the 
relative   block  number  of  the  first  directory  block  (block  1-6)  that 
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has  an  unused  entry  in  it.  The  valid/invalid  flag  is  used  to 
prevent  users  from  accidentally  wiping  out  information  in  their 
files  by  logging  back  on  after  a  system  crash  or  other  calamity 
occurinq  while  they  were  logged  on  has  destroyed  the  copy  of  their 
bitmap  in  their  Logon  Block  before  it  could  be  rewritten  to  disk. 
Whenever  a  user  logs  on,  the  bitmap  is  read  in  from  their  disk  file 
and  stored  in  their  Logon  Block  and  the  valid/invalid  flag  is  set  to 
"-1"  as  a  flag  that  the  bitmap  is  no  longer  current.  When  the  user 
logs  back  out,  the  updated  bitmap  is  transferred  from  the  user's 
Logon  Block  back  to  the  first  240  bytes  of  block  0  of  the  user's 
file  and  the  valid/invalid  flag  is  set  to  zero  to  indicate  that  the 
bitmap  is  current  again.  Should  the  system  crash  while  the  user  is 
logged  on,  the  bitmap  must  be  rebuilt  by  the  off-line  routine  BITFIX 
which  builds  a  new  bitmap  for  the  user  based  on  the  current  contents 
of  the  user's  file. 

Words  121  through  253  of  the  first  block  are  taken  up  by  the 
user's  macro  directory.  The  macro  directory  consists  of  19  7- word 
blocks  structured  as  in  Fig.  3.5.1.2.  A  5-word  (10  character) 
macro  name  is  followed  by  the  starting  address  (block  number  and 
offset  within  block)  of  the  macro  text.  At  this  starting  address 
will  be  found  one  word  containing  the  length  of  the  macro  text  in 
characters,  followed  by  the  macro  text.  This  directory  (and  the 
entire   file)   is   maintained   by  the  Set  Handler  routine  ALOCD,  the 
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Bitmap  Handler,  which  uses  the  first  word  of  each  macro  directory  as 
a  flaq  for  allocation  of  directory  slots.  If  the  first  word  of  a 
directory  slot  contains  zero  (in  binary,  not  the  character),  that 
directory  slot  is  free;  any  other  value  shows  that  the  directory 
slot  is  in  use  as  a  pointer  to  some  macro  text. 

The  next  six  file  blocks  (1-6)  are  occupied  by  the  query  set 
directory,  as  shown  in  Fig.  3.5.1.3.  This  directory  is  structured 
much  like  the  macro  directory,  the  main  difference  being  more  fields 
within  each  directory  slot.  Each  query  set  directory  slot  can  be 
seen  to  be  a  14-word  long  block  containing  a  query  set  number,  query 
set  name,  and  pointers  to  the  starting  addresses  of  all  of  the 
pertinent  information  on  disk  that  makes  up  the  guery  set.  Both  the 
guery  text  and  guery  set  list  are  stored  in  the  same  manner  as  the 
macro  text  (length  word  followed  by  information).  The  comments  are 
sliqhtly  more  complex,  as  they  are  stored  as  a  one-way  linked  list 
so  that  comments  may  be  added  to  existing  gueries  or  documents. 
Refer  to  Fig.  3.5.1.4  for  details  of  the  comment  chain  structure. 
For  comments,  the  field  labeled  "1ST  COMM  PTR"  points  to  the  first 
comment  in  the  chain.  Each  comment  consists  of  a  length  word 
followed  by  a  two-word  link  field  (block  number  and  displacement) 
that  points  to  the  next  comment  in  the  chain.  The  comment  text 
follows  immediately  after  the  link.  The  last  comment  in  the  chain 
is    pointed   at   by   the   field   of   the   directory   entry   labeled 
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"LAST  C01M  PTR"  and  is  also  flagged  by  having  the  high-order  bit  in 
the  first  word  of  the  link  field  set  to  1.  Also  stored  in  this  word 
is  the  total  length  of  all  comments  (retrieved  by  setting  bit  15  to 
0).  The  tail  pointer  in  the  directory  entry  is  used  to  speed  up  thf* 
attaching  of  comments  by  allowing  the  File  Writer  routine  to  avoid 
running  down  the  chain  each  time  a  new  comment  is  to  be  added.  The 
bit  flag  is  used  to  signal  the  end  of  the  list  to  routines  that  are 
running  down  the  list,  such  as  the  File  Reader  or  the  Delete 
Routine.  The  total  length  of  all  comments  field  is  used  to  put  an 
upper  bound  on  the  length  of  the  buffer  size  needed  to  read  in  user 
comments.  If  a  guery  has  no  comments  attached  to  it,  both  of  its 
comment  pointers  are  set  to  »-1"  to  flag  their  non-existence. 
Document  directories  occupy  the  same  slots  as  guery  slots  in  order 
to  avoid  having  three  kinds  of  directories.  A  document  directory 
entry  is  distinguished  from  a  guery  directory  by  having  bit  15  of 
the  guery/document  number  field  set  to  1.  Again,  in  order  to 
retrieve  the  document  number  one  must  clear  this  bit.  The  only 
fields  in  a  document  directory  entry  that  are  meaningful  are  th2 
guery/document  number  as  just  discussed,  and  the  comment  pointers. 
The  comment  chain  attached  to  a  document  is  stored  in  exactly  the 
same  form  as  one  attached  to  a  guery  set.  As  in  the  case  of  the 
macro  directory  the  first  word  of  the  guery/document  directory  slot 
is  used  as  a  flag  word.   If  this  word   contains   a   negative   value. 
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then  it  is  a  document  directory;  if  it  contains  a  positive  number, 
it  is  a  query  directory;  if  it  contains  zero,  it  is  an  unused 
directory  slot. 

ii5-.2  SET  HANDLER  TABLE 

The  set  Handler  may  be  invoked  by  the  Parser  when  processing 
CHANGE, DELETE,  or  COMMENT  statements;  the  Search  Supervisor  when 
processing  MAKE  or  FIND  statements;  and  the  Information  Printer 
when  processing  PRINT  commands.  These  tasks  use  the  EXECUTIVE  trap 
SPRFRM  to  start  up  the  Set  Handler. 

These  routines  communicate  with  the  Set  Handler  via  a  table 
known  as  the  Set  Handler  Table.  The  address  of  the  Set  Handler 
Table  must  be  placed  in  register  RO  by  the  performing  task.  There 
are  effectively  three  different  kinds  of  Set  Handler  Tables,  as 
shown  in  Figs.  3.5.2.1,  3.5.2.2,  and  3.5.2.3.  The  table  shown  in 
Fig.  3.5.2.1  is  used  for  all  read  and  write  operations  (op  codes 
0-6,12,13,17).  The  contents  of  the  read/write  form  of  the  Set 
Handler  Table  are  as  follows: 


70 

OFFSET         CONTENTS 

0-1..... address  of  Logon  Block 

2 .....command  (one  byte  long  only) 

3.... not  used 

4-5 guery/document  #  -  high-order  bit  is  set  to 

0  if  guery  #,  1  if  document.   this  word  is 

set  to  0  for  a  macro  read  or  write. 
6- 15.. ........ .guery/macro  name 

16- 17. .........  address  of  guery/macro  text 

18-1 9. ........ .address  of  set  list 

20-21 ......... .address  of  comments 

A  list  of  Set  Handler  commands  is  given  in  Table  3. 5.2.4.   Note  that 

not   all   fields  of  this  table  will  be  filled  in  on  every  invocation 

of  the  Set  Handler.   Bead  and  write  commands  need  fill  in  only  those 

addresses   pertaining   to  the  items  to  be  read/written  (there  is  one 

exception,   however;    see   the   section   describing   FILRDR).    For 

instance,   if   a   command   of   "2"  (read  guery  text  and  set  list)  is 

used,  then  only  the  addresses  of  buffers  in  which  to  store  the  guery 

text  and  the  set  list  need  be  allocated.   Similarly,  if  a  command  of 

"6"  (write  comments)  is  used,  then  only  the  address   of   the   buffer 

containing   the  comments  to  be  attached  to  the  specified  set  need  be 

filled  in.   Also,  it  is  not  necessary  to  fill  in  both  the   set   name 

and   set  number  on  a  read,  either  one  being  sufficient.   The  address 

of  the  Logon  Block,  a  command,  and  either  a  set  name  or   set   number 

must   always   be   present,   however.    In   order  to  avoid  accidental 

matches  to  incorrect  sets  in  the  directory  search,   care   should   be 

taken   to   clear   unused  set  name/number  fields  when  only  one  of  the 

two  is  being  used.   When  not  being  used,  the  guery/macro  name  should 
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be  set  to  blanks,  and  the  query/document  number   should   be   set   to 
zero. 

The  next  type  of  Set  Handler  Table  we   shall  consider   is   the 
delete  form  of  the  Set  Handler  Table  (Fig.   3.5.2.2),  used  only  when 
deleting  query  sets,  comments,  or  macro  definitions   (command   codes 
7-10, 14, 15).    Its   contents   are   the   same   as  the  read/write  form 
except  for  several  minor  variations:   Byte  three  contains  the  number 
of   items   to   be  deleted,  unless  a  code  meaning  "delete  all  except" 
(codes  8,10,15)  is  used,  in  which  case  the  third  byte   contains   the 
number   of  set/macro  identifiers  that  are  to  be  saved.   Note  that  in 
this  case  a  zero  in  byte  three  implies  "delete  all".   The  only  other 
difference   from   the   read/write   form   is   the   absense   of  buffer 
addresses.   These  are  replaced  by  a  series  of  six-word   long   blocks 
containing   a   query   number   and/or  name  for  each  query/macro  to  be 
saved  or  deleted,  depending  on  the   command   used.    On   a   "delete" 
command   (codes   7,9,14)  the  identifiers  of  the  sets/comments/macros 
to  be  deleted  are  listed,  while  in  the  case  of  a  "delete  all  except"  | 
(codes  8,10,17)  the  identifiers  of  the  items  to  be  saved  are  listed. 

The  last  Set  Handler  Table  to  consider  is  the  rename  format 
(Fig.  3.5.2.3).  This  table  is  used  only  for  rename  commands  (codes 
11  and  16)  and  has  a  very  simple  layout.  The  first  three  words  of 
the  table  are  identical  with  the  read/write  form  (except  for  command 


75 

cole,  of  course) .  These  three  words  are  then  followed  by  the  old 
query/macro  name  (5  words)  and  the  new  name  to  be  assigned  to  the 
query/  macro  (5  words) . 

ls.ls.1    SET  HANDLER  SUPERVISOR 

Now  that  we  have  an  understanding  of  the  input  to  this  module, 
let  us  consider  the  routines  contained  in  the  Set  Handler  module  and 
delineate  their  individual  functions.  Figure  3.5.3.1  shows  us  the 
various  routines  in  the  Set  Handler  and  their  interconnections. 
Notice  that  the  only  entry  path  into  the  Set  Handler  is  through  the 
Set  Handler  Supervisor  (SETHLR)  .  This  routine  of  the  Set  Handler 
module  acts  as  a  startup  routine  for  all  the  action  routines  within 
the  Set  Handler.   Its  functions  are: 

1)  Allocation  of  workspace  for  all  routines; 

2)  Translation  of  all  reguests  for  action  on  the  "last"  set  into  a 
specific  set  name/number; 

3)  Determination  of  which  action  routine  to  call  (via  a  "JMP" 
command)  . 

The  seguence  of  operations  performed  by  the  SETHLR  routine  is  as 
follows: 

1)  Check  for  valid  command  code; 

2)  Check  to  see  if  this  is  a  reference  to  the  "last"  set  (signalled 
by   "LAST"   in   the   guery   name)  .   If  "LAST"  occurs  in  any  set  name 
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slot,  then  move  in  the  name/number  of  the  "last"  set,  if  it  exists; 

3)  Allocate  workspace,  put  address  in  R5; 

4)  Initialize  the  .THAN  Block  in  the  LOGON  Block  (using  the  last  320 
(500  octal)  bytes  of  the  workspace,  starting  at  byte  72  (120  octal) 
as  the  I/O  buffer)  ; 

5)  Do  a  JMP  to  the  correct  action  routine. 

No  file  accesses  are  made  from  this  routine,  the  "last"  set 
information  being  obtained  from  the  LOGON  Block.  The  only  data 
structures  modified  by  this  routine  are  the  LOGON  Block  (the  .TRAN 
Block  is  set  up  and  the  "last"  set  name  is  changed  on  rename  of 
"last"  set)  and  the  Set  Handler  Table  (the  "last"  set  identifier  is 
moved  in  if  the  "last"  set  is  referenced) .  For  further  information, 
refer  to  the  program  listing. 

ii.5-.U  FILE  WRITER 

Now  let  us  consider  the  File  Writer  (FILWRT).  This  routine 
accepts  a  Set  Handler  Table  containing  a  "write"  command  and  pointer 
to  information  to  be  written  to  disk  in  the  user's  personal  file. 
This  routine,  the  delete  routine,  and  the  logon/logoff  routines  are 
the  only  ones  allowed  to  alter  the  user's  file. 
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When  entered  (via  a  "JMP"  from  routine  SETHLR) ,  FTLWRT  expects 
to  find  the  address  of  a  Set  Handler  Table  in  reqister  RO  and  the 
address  of  its  workspace  in  R5.  This  Set  Handler  Table  should 
contain  a  command  code  of  4-6  or  13.  A  set  identification,  macro 
name,  or  document  number  must  also  be  present,  alonq  with  a  pointer 
to  the  user's  Loqon  Block.  Last,  but  not  least,  there  must  be  some 
pointers  to  the  buffer  (s)  containinq  the  information  to  be  written 
to  disk.  If  a  query  set  is  beinq  written  out  to  disk,  then  either  a 
command  of  4  or  5  will  be  used,  dependinq  on  whether  the  user  has 
attached  comments  to  the  set  or  not.  The  pointer  to  the  query  text 
ani  to  the  set  list  must  be  present  in  either  case.  The  pointer  to 
the  query  text  is  the  address  of  a  buffer  containinq: 

1)  A  pointer  to  the  Loqon  Block; 

2)  A  pointer  to  the  carriaqe  return  -  line  feed  endinq  the  text 
strinq; 

1)  Thp  actual  text  of  the  query* 

This  stranqe  format  is  due  to  the  current  Oser  Irterface/Parser 
communication  protocol.  The  File  Writer  computes  the  lenqth  of  the 
auery  text  (without  the  carriaqe  return  -  line  feed)  from  the 
startinq  address  of  the  text  and  the  address  of  the  CR-LF.  When 
written  to  disk,  the  Loqon  Block  pointer  and  CR-LF  pointer  are 
replaced  by  a  word  containinq  the  lenqth  of  the  query  text  in 
characters  (note  that  this  is  identical  with  the  number  of  bytes   of 
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storage) .  Similarly,  the  set  list  pointer  points  to  a  buffer 
containing: 

1)  The  number  of  documents  in  the  set  (1  word) ; 

2)  Two  words  per  document,  the  document  number  and  its  relative  rank 
in  the  set  list. 

The  number  of  documents  is  transformed  into  a  length  in  bytes  (i.e. 
multiplied  by  4)  before  being  written  to  disk  in  order  to  simplify 
the  read  mechanism.  If  comments  are  present,  the  comment  pointer  is 
the  address  of  a  buffer  containing: 

1)  Two  blank  words  for  use  by  the  File  writer  as  link  words; 

2)  The  length  of  the  comments  in  bytes; 

3)  The  comment  string. 

The  output  format  of  the  comment  string  will  be  discussed  along  with 
the  comment  string  writing  mechanism. 

When  writing  out  a  new  guery  set,  the  File  Writer  first  calls 
routine  ALCCD,  the  disk  file  space  allocator,  to  obtain  a  directory 
entry  slot  to  use  for  the  new  guery  set.  As  soon  as  this  reguest  is 
granted,  the  guery  set  identification  number  and  name  are  moved  into 
the  area  in  the  workspace  reserved  for  building  the  new  guery 
directory  entry.  Next  the  guery  text  length  is  computed  and  stored 
at  the  head  of  the  text  in  the  output  buffer.  Disk  space  for  the 
text  is  then  reguested  by  another  JSR  to  ALOCD.  When  the  disk 
address  is  returned  by  ALOCD  it  is  entered . in  the  workspace  copy   of 
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the  directory  entry  record  and  is  passed  (along  with  appropriate 
pointers  to  the  text  buffer,  Logon  Block,  etc.)  to  the  routine 
HRTDSK,  a  subroutine  that  formats  the  information  into  disk  block 
size  and  writes  it  to  disk. 

The  guery  set  list  and  comments  are  then  handled  in  a  similar 
fashion.  Upon  completion  of  their  transfer  to  disk  the  File  Writer 
must  then  write  the  directory  entry  to  disk  and  update  the  "last" 
set  information  in  the  Logon  Block  (since  the  new  query  set  is  by 
definition  the  new  "last"  set).  When  these  duties  have  been 
completed,  the  File  Writer  performs  the  F.XECUTIVE  trap  $RETN, 
effecting  a  return  of  control  from  the  Set  Handler  to  the  periorminq 
task. 

Macro  text  writes  (code  13)  and  comment  only  writes  (code  6) 
are  handled  somewhat  differently.  A  macro  text  write  resembles  a 
query  set  write.  However,  since  neither  a  set  list  nor  comments 
will  be  present  the  macro  directory  entry  is  shorter  than  the 
reqular  query  set  directory  entry,  leadinq  to  many  directory  size 
kludqes  in  the  File  Writer  (See  Fiqs.  3.5.1.1  and  3.5.1.2  for  a 
diaqram  of  the  directory  layouts) .  Another  important  difference  is 
the  format  of  the  text  buffer  passed  to  the  File  Writer.  Unlike  the 
query  text,  the  macro  +  ext  is  passed  in  a  buffer  containinq  the 
character  count  of  the  text  followed  by  the  text  itself.   Since  this 


81 

more  closely  resembles  the  format  of  the  document  list  (at  least 
after  conversion  of  the  document  count  into  a  byte  count) ,  the  macro 
text  is  put  in  the  correct  format  and  written  to  disk  by  the  same 
section  of  the  File  writer  that  handles  the  set  list. 

Comment  writes  (code  6)  are  the  most  complex  operations 
performed  by  the  File  writer.  Three  distinct  cases  must  be 
considered: 

1)  Adding  comments  to  a  query  set  that  currently  has  no  comments; 

2)  Adding  comments  to  a  document  that  currently  has  no  comments; 

3)  Adding  comments  to  a  query  set  or  document  that  has  previously 
had  comments  attached  to  it. 

We  discover  into  which  category  a  given  request  falls  by  examining 
the  query/document  number  and  the  directory  entry  for  the  query  set 
(or  existence/non-existence  of  a  directory  entry  in  the  case  of 
documents) .  The  subroutine  DIRSRH  is  used  to  search  the  directory 
for  this  information.  If  the  entity  to  which  the  comments  are  to  be 
attached  is  a  document  and  no  comments  have  been  previously  attached 
then  no  directory  entry  will  exist  for  this  document.  If  the  entity 
is  a  query  set  with  no  currently  attached  comments,  a  directory  must 
exist,  but  the  pointer  to  the  comment  string  will  be  set  to  -1.  For 
either  a  document  or  a  query  set  with  existing  comments,  the 
directory  entry  for  the  document  or  query  set  will  contain  pointers 
(disk   addresses)  to  the  head  and  tail  of  the  comment  chain  for  that 
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entity  (see  Fig.  3.5. 1.1  for  a  diagram  of  the  structure  of  the 
comment  list).  In  light  of  this  information,  we  can  see  how  the 
File  Writer  must  proceed.  First,  the  directory  searcher  (DIRSRH)  is 
called  to  find  out  if  a  directory  entry  exists  for  this  entity  and 
to  retrieve  a  copy  if  it  does  (non-existence  is  flagged  by  the 
Directory  Searcher  by  moving  -1  to  the  first  word  of  the  buffer  in 
which  it  has  been  reguested  to  place  the  directory  copy) . 

If  the  entity  is  a  document  and  no  directory  entry  exists,   the 
File  Writer  performs  the  following  actions: 

1)  Reguests  a  directory  slot  and  disk  space  for  the  comments  from 
ALOCD; 

2)  Fills  out  the  directory  entry  with  both  the  head  and  tail 
pointers  poininng  to  the  newly  allocated  disk  space  in  which  the 
comments  are  to  be  written; 

3)  Moves  "-1"  to  the  first  word  of  the  output  buffer  for  the 
comments  (again,  see  Fig.  3.5.1.4  for  the  comment  chain  layout)  to 
flag  the  non-existence  of  further  links  in  the  chain; 

4)  Moves  the  length  of  the  comment  string  to  the  second  and  thiri 
words  (total  comments  length  and  local  string  length,  respectively); 

5)  And  then  writes  out  the  directory  entry  and  comment  string. 
Attaching  comments   to   a   guery   set   that   currently   contains   no 
comments   is   done   in   a   similar   fashion,   altered   only   by   the 
pre-existence  of  a  directory  entry  that  must  be  altered  rather   than 
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allocating  a  new  directory  entry  slot.  In  the  case  of  adding  a  new 
comment  to  a  query  set  or  document  that  already  has  one  or  more 
comments  attached  to  it,  the  File  Writer  must: 

1)  Pead  the  comment  pointed  to  by  the  tail  pointer  of  the  directory 
entry; 

2)  Hove  the  word  containing  the  total  length  of  the  existing 
comments  (second  word  of  old  last  comment)  to  the  second  word  of  the 
new  tail  comment  and  add  in  the  length  of  the  new  comments; 

3)  Hove  the  disk  address  of  the  new  comments  to  the  link  field  of 
the  old  tail  comment  and  rewrite  the  old  tail  comment  (or  the  first 
block  thereof  if  it  is  extends  across  a  block  boundary) ; 

4)  Update  the  tail  pointer  in  the  directory  record  to  point  to  the 
new  comment; 

5)  Write  out  the  new  comment  record; 

6)  Rewrite  the  updated  directory  record; 

7)  Check  to  see  if  the  "last"  set  has  been  modified,  and  update  the 
"last"  set  information  in  the  Logon  Block  if  so. 


1*.1±!>   llkl    READER 

Now  that  we  have  seen  the  mechanism  used  for  writing  data  into 
the  users1  files,  let  us  consider  the  mechanism  used  for  retrieving 
said  data.   This  routine  of  the   Set   Handler   is   called   the   Pile 
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Reader  (PILRDR)  .  It  is  entered  by  a  JtIP  command  from  the  SETHLR 
routine  whenever  SETHLR  detects  a  read  op  code  (0-3,11-14)  in  the 
Set  Handler  Table.  For  each  of  these  op  codes,  the  Type  I  Set 
Handler  Table  is  used.  The  only  other  routine  referenced  by  the 
File  Reader  is  the  Directory  Searcher  (DIRSRH). 

The  first  operation  the  File  Reader  performs  (after  a  minute 
amount  of  housekeeping)  is  to  determine  what  is  requested  of  it.  If 
a  list  of  macros  or  a  list  of  all  comments  is  requested,  then 
special  sections  of  the  File  Reader  are  JflP'ed  to  (.1ACALL  and 
COMALL,  respectively) .  If  a  macro  text  or  any  type  of  set 
information  is  requested,  then  FILRDR  must  obtain  the  directory 
entry  for  the  data  to  be  read.  This  is  done  by  one  of  two  methods; 
if  the  data  requested  is  anything  but  information  from  the  "last*1 
set,  then  the  Directory  Searcher  (DIRSRH)  is  called  via  a  JSR 
command,  or  if  the  information  desired  is  from  the  "last"  set 
(signalled  by  a  set  name  of  "LAST"  in  the  Set  Handler  Table) ,  then 
the  section  of  the  File  Reader  following  label  LASTRT  is  used  to 
obtain  the  information  directly  from  the  user's  Logon  Block,  thus 
avoiding  one  or  more  disk  reads.  In  either  case,  the  adiress  of  the 
disk  block  containing  the  disk  directory  for  the  set/macro  to  be 
read  is  returned,  along  with  the  directory  entry  itself  (in  working 
storage) ,  to  the  File  Reader.  If  the  read  reguested  was  from  the 
"last"   set,   but   no   "last"   set  exists,  then  a  "-1"  is  put  in  the 
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first  word  of  the  Set  List  Buffer  and  a  $RETN  trap  is  executed  in 
order  to  allow  a  "read  last  set"  to  be  done  on  the  first  query  of  a 
session  without  causing  undue  problems.  Note  that  this  forces  us  to 
fill  in  the  set  list  pointer  of  all  Set  Handler  Tables  referencing 
the  "last"  set  and  check  for  this  error  condition  in  each  routine 
doing  a  read  from  the  user's  file,  even  if  we  are  not  reguesting  a 
set  list  read-  If  a  read  is  reguested  from  any  other  non-existent 
set  than  the  "last"  set,  an  $ERROR  trap  return  is  done.  Once  we 
have  the  directory  entry  for  the  selected  set  or  macro,  then  we  may 
begin  to  read  in  the  requested  information.  The  guery  text,  set 
list,  comments,  and  macro  definition  reads  are  all  done  by 
essentially  the  same  section  of  code  (starting  at  label  READIT)  with 
parameters  for  the  read  loop  set  to  point  to  the  correct  buffers, 
etc.  by  a  compare-branch  tree  preceeding  the  loop  for  each 
iteration  of  the  loop.  The  code  starting  with  label  DIDIT  is  a 
trailer  section  that  follows  a  performance  of  the  read  loop  and 
determines  whether  another  loop  through  the  read  loop  is  needed. 
This  occurs  whenever  both  the  guery  text  and  set  list  must  be  read 
for  a  set  or  when  comments  are  being  read  and  the  one  just  read  has 
a  pointer  to  another  comment  chained  to  it.  Once  all  the 
information  has  been  read  in,  the  File  Reader  exits  via  a  $RETN  trap 
at  label  DONE. 
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The  two  special  case  sections  of  the  File  Reader,  HACALL  and 
COMALL,  are  straightforward  linear  search  strategies  that  merely 
obtain  the  desired  lists  of  macro/document  ID's.  Macros  are  listed 
in  the  buffer  pointed  to  by  the  guery  text  pointer  with  one  macro 
name  every  five  words  and  the  number  of  macro  names  stored  in  the 
guery  number  field  of  the  Set  Handler  Table. 

Commented  document  lists  are  returned  in  the  set  list  buffer, 
with  the  first  two  words  of  the  buffer  being  identical  counts  of  the 
number  of  document  numbers  following.  After  the  two  count  words 
come  the  document  numbers  themselves.  These  are  in  the  form  of  two 
word  entries,  the  first  word  being  the  document  number  and  the 
second  zero  in  order  to  simulate  a  normal  set  list. 

This  concludes  our  discussion  of  the  File  Reader  and  allows  us 
to  proceed  to  the  next  major  routine  in  the  Set  Handler  Module,  the 
Delete  Routine  (DELRTN). 

^5^6  DELETE  ROOTINE 

The  Delete  Routine  (DELETR)  handles  reguests  to  delete  guery 
sets,  comments,  or  user-defined  macros  from  a  user's  file.  It  does 
this  by  finding  the  directory  entry  for  the  information  to  be 
deleted,  zeroing  the  first  word  of  the  directory  entry  if  the 
reguest  is  to  delete  a  guery  set  or  a  macro  or  by  moving  "-1"  to  the 
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comment  pointers  if  comments  are  to  be  deleted.  After  this  has  been 
performed  the  Delete  Routine  calls  the  Bitmap  Handler  (ALOCD)  which 
frees  disk  space  involved. 

In  finding  the  directory  entry  for  a  query/macro  to  be  deleted, 
the  Delete  Routine  reads  in  the  the  user's  directory  (one  block  at  a 
time)  and  scans  each  block  linearly.  This  approach  is  used  rather 
than  calling  the  Directory  Searcher  to  locate  the  directory  in  order 
to  minimize  the  number  of  disk  accesses  when  the  user  is  deleting  a 
large  number  of  queries  at  once.  Each  query/macro  directory  entry 
scanned  is  compared  to  the  list  of  queries/macros  attached  to  the 
Set  Handler  Table  passed  to  the  delete  routine  by  the  Delete 
Statement  Parser.  Tf  the  query/macro  identifier  matches  one  of  the 
identifiers  in  the  list  of  queries/macros  attached  to  the  table  ani 
a  "delete"  command  code  (7  for  query  set,  9  for  comments,  14  for 
macro)  has  been  entered  in  the  command  table;  or  if  no  match  has 
been  found  in  the  identifier  list  for  a  directory  entry  and  a 
"delete  all  except"  command  code  (8  for  query  sets,  10  for  comments, 
15  for  macros)  has  been  entered  in  the  command  table,  then  the 
directory  address  is  passed  to  the  portion  of  the  Delete  Routine 
which  causes  storage  deallocation  and  directory  deletion; 
otherwise,  the  directory  entry  is  left  unaltered  and  the  search 
continues.  It  should  be  noted  that  "delete  all"  for  query  sets, 
macros,  or  comments  is  denoted  by  a  "delete  all  except"  command  with 
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a  null  list  of  identifiers. 

Disk  storage  deallocation  for  query  set  lists,  query  texts,  and 
macro  texts  is  done  by  a  loop  that  reads  in  the  information  to  be 
deleted  in  order  to  qet  the  length  of  the  disk  space  to  be  freed  and 
then  calls  the  Bitmap  Handler  (ALOCD)  which  marks  the  space  free  in 
the  user's  bitmap.  Comments  are  handled  in  a  similar  fashion,  but 
by  a  recursive  routine  (CHASER)  which  runs  down  the  links  of  the 
comment  chain  marking  the  disk  space  occupied  by  each  comment  free 
in  the  bitmap. 

After  the  disk  space  has  been  marked  free  in  the  bitmap  the 
comment  pointers  in  the  directory  are  set  to  -1  if  comments  only 
have  been  deleted  or  the  first  word  of  the  directory  entry  is  set  to 
zero  if  an  entire  guery  set/macro  definition  has  been  deleted. 

After  all  items  in  the  query/macro  identifier  list  have  been 
deleted  or  the  end  of  the  directory  has  been  reached,  control  is 
returned  to  the  Delete  Statement  Parser  (and  hence  to  the  User 
Interface)  via  a  $BETN  trap. 
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■ 

h.5.^1    RENAME  ROUTINE 

The  Rename  Routine  (RENAME)  of  the  Set  Handler  module  is  an 
extremely  simple  routine.  Its  purpose  is  to  attach  a  (new)  mnemonic 
name  to  a  query  set  or  to  a  user  defined  macro. 

The  Rename  Routine  accepts  as  input  in  register  RO  the  address 
of  a  Type  III  Set  Handler  Table  (see  Section  3.5.2).  This  table 
contains  the  current  identifier  (name  or  number)  of  a  query  set  or 
user  macro  and  the  new  name  to  be  attached  to  the  set/macro.  The 
Rename  Routine  uses  the  Directory  Searcher  (DIRSRH)  to  retrieve  the 
address  of  the  directory  entry  for  the  set/macro  to  be  renamed.  The 
Rename  Routine  then  reads  in  the  disk  block  containing  the  directory 
entry,  replaces  the  name  field  of  the  directory  entry  with  the  new 
name  from  the  Set  Handler  Table,  rewrites  the  block  containing  the 
directory  entry,  and  exits  via  a  $RETN  trap. 

3.5. 8  EITMAP  HANDLER 


The  Bitmap  Handler  (ALOCD)  is  used  to  allocate/deallocate  disk 
storage  and  directories  for  the  Set  Handler.  This  routine  is 
performed  as  a  subroutine  (via  a  JSR)  by  the  File  Writer  and  the 
Delete  Routine  and  receives  all  of  its  parameters  through  the  Set 
Handler  Workspace,  whose  base  address  is  located  in  register  R5 
throughout   the   Set  Handler.   The  workspace  parameters  used  by  this 
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routine  are  the  Logon  Block  pointer  (LOGPTR),  amount  requested 
(NBRREQ) ,  block  number  (RELBLK) ,  and  offset  in  block  (BLKDSP)  .  The 
pxact  layout  of  the  Set  Handler  Workspace  is  stored  as  a  template 
macro  in  the  file  SYSMAC.SBL. 

The  NBRREQ  field  of  the  workspace  is  used  to  hold  the  number  of 
bytes  of  storage  to  be  allocated/deallocated  or  a  directory 
allocate/deallocate  flag  when  either  query  or  macro  directories  must 
be  manipulated.  The  Logon  Block  pointer  contains  the  address  of  the 
user's  Logon  Block,  needed  by  this  routine  for  reading  directories 
from  the  user's  file  when  doing  directory  allocates/deallocates. 
The  block  number  and  offset  in  block  fields  are  used  for  passing 
disk  addresses  of  the  beginning  address  of  disk  space  allocated  or 
to  be  deallocated  by  the  Bitmap  Handler. 

There  are  essentially  five  cases  of  Bitmap  Handler  action 
reguests  to  consider: 

1)  Allocating  user  file  space; 

2)  Deallocating  user  file  space; 

3)  Allocating  macro  directories; 

4)  Allocating  query  set  directories; 

5)  Deallocating  query  set/macro  directories. 

Deallocation  of  directories,  whether  they  are   macro   or   query   set 
directories,   is  very  straightforward.   The  Bitmap  Handler  is  passed 
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the  disk  address  of  the  offending  directory  entry  in  the  disk 
address  fields  of  the  workspace  and  needs  merely  read  in  the 
required  block,  move  zero  to  the  first  word  of  the  entry,  rewrite 
it,  and  return.  Allocation  of  directories  is  only  slightly  more 
complex.  When  requesting  allocation  of  a  directory,  the  calling 
routine  (FILWRT)  places  a  flag  describing  the  type  of  directory 
desired  in  the  NBRREQ  field  of  the  workspace  (these  flags,  MACMSK 
and  DIRJ1SK,  are  contained  in  the  f  ile  SYSMAC.  SfIL)  .  The  two  short 
routines  MACALC  and  DIRALC  are  used  to  allocate  macro  and  query  set 
directories,  respectively.  Both  routines  work  by  searching  linearly 
through  the  proper  directory  until  they  find  a  directory  slot  with  a 
first  word  of  zero  or  the  end  of  the  directory.  If  a  free  slot  is 
found  its  block  number  and  offset  within  the  block  is  placed  in  the 
RELBLK  and  BLKDSP  fields  of  the  workspace  and  the  Bitmap  Handler 
executes  an  RTS  to  return  control  to  the  calling  routine.  If  the 
end  of  the  directory  is  reched  without  finding  a  free  directory,  a 
"-1"  is  moved  to  the  RELELK  field  of  the  workspace  to  signal  this 
fact  and  an  RTS  is  done. 

When  file  space  allocation  is  requested,  the  number  of  bytes  to 
be  allocated  is  placed  in  the  NBRREQ  field.  The  Bitmap  Handler 
calls  subroutine  VALCNK,  which  converts  this  number  to  the  number  of 
chunks  that  must  be  allocated  (i.e.  ceilinq  [bytes/64]).  The 
Bitmap  Handler  then  looks  through  the  bitmap  until  it   finds   enough 
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contiguous  free  chunks  to  satisfy  the  request.  These  bits  are  then 
cleared,  the  address  of  the  lowest  numbered  block  allocated  is 
placed  in  RELBLK  and  the  address  of  the  lowest  byte  allocated  in 
this  block  is  placed  in  the  BLKDSP  field.  The  higher  level  routines 
therefore  receive  the  starting  address  of  a  contiguous  segment  of 
disk  space  guaranteed  to  be  at  least  as  large  as  they  reguested,  but 
possibly  spread  across  several  blocks. 

File  space  deallocations  are  done  in  a  similar  fashion.  The 
number  of  bytes  to  be  deleted  is  placed  in  NBRREQ,  but  with  bit  15 
set  to  "1"  as  a  flag  that  a  delete  is  being  requested.  The  starting 
address  of  the  disk  string  to  be  deleted  is  placed  in  the  RELBLK  and 
BLKDSP  fields,  as  with  directory  deletes.  The  Bitmap  Handler  then 
tranlates  the  number  of  bytes  into  the  corresponing  number  of  chunks 
and  sets  the  proper  bits  in  the  bitmap  back  to  "1"  to  show  that 
space  free. 

Before  attempting  to  understand  the  address  translation 
mechanism,  it  is  perhaps  wise  to  reconsider  the  bitmap  layout. 
There  are  16  bits/word  in  the  bitmap,  8  chunks/block  of  disk. 
Therefore  each  word  in  the  bitmap  covers  two  blocks  of  disk  chunk 
space.  Since  blocks  are  allocated  from  the  highest  block  down  and 
we  associate  the  first  word  in  the  bitmap  with  the  last  block  of  the 
file,  the  difference  in  bytes  between  any  byte  and  the  first  byte  of 
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the  bitmap  is  the  number  of  blocks  down  from  the  hiqhest  block  in 
the  disk  file.  The  low-order  bit  in  a  byte  corresponds  to  the  chunk 
startinq  at  displacement  0  in  that  block,  the  next  one  to  the  chunk 
starting  at  offset  64  (bytes) ,..., the  high-order  bit  (bit  7) 
corresponds  to  the  chunk  starting  at  byte  448  within  the  block.  In 
order  to  convert  a  bitmap  address  into  a  disk  address,  therefore,  we 
must  subtract  the  offset  in  bytes  into  the  bitmap  from  the  highest 
block  number  in  the  file  to  get  the  block  number.  To  get  the 
displacement  in  block,  we  need  to  multiply  the  relative  bit  position 
within  the  byte  by  64. 

1-.6  SEARCH  SUPERVISOR 

The  Search  Supervisor  (SRCHSP)  module  is  the  scheduling  and 
control  module  that  orchestrates  the  performance  of  the  Serge 
Routine,  Set  Handler,  Index  and  Postings  Handler,  and  Full-Text 
Searcher  modules  in  the  execution  of  a  search. 

The  Search  Supervisor  may  be  performed  by  either  the  FIND 
Statement  Parser,  the  PRINT  Statement  Parser,  or  the  MAKE  Statement 
Parser  and  is  passed  the  address  of  a  Search  Supervisor  Table  (Fig. 
3.6.1.1)  in  register  RO.  This  table  contains  all  the  pertinent 
information  collected  from  the  user's  query  by  the  command  parser. 
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iifUl  SEARCH  SUPERVISOR  TABLE 

Since  the  primary  function  of  the  Search  Supervisor  (and  the 
rest  of  EUREKA,  for  that  natter)  is  to  perform  the  operations 
described  in  the  Search  Supervisor  Table,  we  shall  take  a  detailed 
look  at  its  contents  and  fori. 

Referring  to  Fig.   3.6.1.1,  we  see  that  the  first  three   words 

pointed   at   by   RO   are   reserved  for  EXECUTIVE  use.  This  actually 

reflects  an  earlier  incarnation  of  the  EXECUTIVE,   and  these   words 
are  not  currently  used. 

The  next  word  in  the  table,  "PTR  TO  TTY  STRING",  is  the  address 
of  the  buffer  containing  the  text  of  the  user's  query.  The  next  six 
words  are  all  pointers  (memory  addresses)  to  various  information 
blocks  within  the  table  that  shall  be  described  later. 

The  next  block,  "•IN*  CONTEXT",  is  a  three  word  descriptor 
identifying  the  contexts  in  which  the  term  expression  must  occur. 
Similarly,  the  block  labeled  "» PRINT*  CONTEXT"  is  a  one  word 
descriptor  identifying  the  contexts  to  be  printed  from  documents 
that  satisfy  the  search  request. 

The  word  labeled  "DEVICE"  is  a  descriptor  specifying  whether 
the  information  is  to  be  printed  on  the  line  printer  or  displayed 
upon  the  user's  screen. 
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The  five  word  block  "SET  NAME"  is  the  mnemonic  name  the  user 
wishes  to  have  attached  to  the  resulting  query  set. 

"TERMS"  and  "TERM  QUADS"  are  the  two  blocks  that  specify  the 
search  terms  for  this  query  and  in  what  relationship  they  are  to 
occur.  Fiq.  3.6.1.2  describes  the  structure  of  these  two  blocks. 
The  "TERM  QOADS"  block  holds  a  set  of  four  word  descriptors  which 
form  a  binary  operation  tree  that  describes  the  search  expression. 
The  first  descriptor  in  the  block  is  the  root  node  of  the  operation 
tree. 


Each  descriptor  is  made  up  of  four  words,  the  first  being  a 
word  of  bit-flags,  the  second  and  third  beinq  pointers  to  the  left 
and  riqht  hand  terms  of  the  expression  (with  respect  to  the  operator 
described  at  this  node) ,  and  the  fourth  beinq  a  pointer  to  the 
address  at  which  the  results  of  this  operation  are  to  be  placed. 
The  bit-flag  word  is  broken  down  as  follows: 
Bit  15,14:  Operation  to  fce  performed; 
00  =>  OR 

10  =>  AND 

11  =>  AND  NOT 

Bit  11  :  1  =>  Suffixing  to  be  performed  on  left  hand  side  term. 
Bit  10  :  1  =>  Prefixing  to  be  performed  on  left  hand  side  term. 
Bit  8     :  1  =>  Left  hand  side  pointer  points  to  another  node   (term 
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quad)  rather  than  a  leaf  (term) . 
Bit  3     :  same  as  bit  11,  only  for  right  hand  side. 
Bit  2     :  saae  as  bit  10,  only  for  right  hand  side. 
Bit  0    :  sane  as  bit  8,  only  for  right  hand  side. 

Note  that  bits  8,  10,  and  11  are  bits  3,  2,  and  0  of  the   high-order 
byte. 

The  node-leaf  selector  bit  allows  us  to   handle   cases   of   the 
form: 

FIND  'A'  *  'B'  *  •C 
Since  we  use  binary  operations  exclusively,  we  must  first  AND 
toqether  the  list  of  documents  responding  to  the  first  search  term 
with  the  list  of  documents  responding  to  the  second  search  term, 
thus  producing  a  temporary  result  to  be  ANDed  with  the  list  of 
documents  responding  to  the  third  search  term.  Whenever  the 
node-leaf  bit  is  set  to  1  for  either  the  left  or  right  hand  side  the 
"TERM  POINTER"  points  to  the  set  quad  of  the  operation  that  must  be 
performed  in  order  to  generate  the  temporary  result  list  needed  to 
perform  the  operation  described  in  the  current  node*  If  the 
right/left  node-leaf  bit  is  set  to  0  then  the  right/left  term 
pointer  is  the  address  of  the  search  term  to  be  used  in  the 
right/left  hand  side  of  the  current  operation.  The  "TERMS"  block 
contains  all  the  terms  pointed  at  by  the  terra  pointers  just 
described.    The   terms   are   laid   out   sequentially  in  the  "TERMS" 
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block,  each  consisting  of  a  length  word  followed  by  the  text  of   the 
term.   The  terra  pointers  actually  point  to  the  length  words. 

The  result  pointer  in  each  term  quad  is  filled  in  by  the  Search 
Supervisor  upon  completion  of  the  operation  specified  in  that  node 
and  is  used  whenever  a  parent  node  term  pointer  points  at  the 
current  node, 

"SET  QUADS"  and  "SETS",  which  describe  the  set  expression  for 
this  search,  are  structured  in  the  same  form  as  "TERfl  QUADS"  and 
"TERNS".  The  only  significant  differences  are  that  the 
pref ixinq-suff ixing  bits  in  the  bit-flag  words  are  meaningless  here 
and  the  "SET  POINTER"  points  to  a  six  word  block  containing  either  a 
query  set  number  in  the  first  word  thereof  or  a  query  set  name  in 
the  last  five  words.  The  indirect  (node-leaf)  bit,  results  pointer, 
and  op  code  bits  are  the  same  as  for  the  "TERM  QUADS". 

3*6^.2  SEARCH  SUPERVISOR  OPERATION 

Now  that  we  have  an  understandinq  of  the  Search  Supervisor 
Table  the  description  of  the  Search  Supervisor  is  relatively 
trivial.  The  only  difficulties  occur  in  attemptinq  to  describe  the 
handlinq  of  the  various  "special  cases"  that  can  occur.  He  shall 
therefore  first  look  at  the  main  structure  of  the  code  and  then  qo 
back   and   describe   how   the  "special  case"  handlers  fit  within  the 


100 

framework  of  the  body  of  the  module. 

The  first  action  of  the  search  Supervisor  (aside  from  some 
housekeeping)  is  to  start  up  the  Set  Expression  Evaluator  (STEVL) , 
which  evaluates  the  set  expression  as  contained  in  the  "SET  QUADS" 
and  "SETS"  blocks  of  the  Search  Supervisor  Table.  While  the  Set 
Expression  Evaluatcr  is  in  progress  the  Search  Supervisor  does  some 
initialization  of  working  lists  for  use  in  evaluting  the  term 
expression.  As  soon  as  the  Set  Expression  Evaluator  finishes,  the 
Search  Supervisor  evaluates  the  term  expression  by  use  of  a  loop 
which  effectively  traverses  the  "TERM  QUADS"  tree  in  postorder, 
using  the  Index  and  Postings  Handler  (IPHNDL)  to  read  in  the 
postings  list  for  each  leaf  (search  term)  ,  and  the  Merge  Routine 
(MERGE)  to  perform  the  operation  specified  at  each  node  of  the  "TERM 
QUADS"  tree. 

The  Search  Supervisor  then  performs  the  Full-Text  Searcher 
(FTSRCH)  which  determines  whether  any  of  the  documents  require 
full-text  searching  and,  if  so,  performs  the  search.  Refer  to  Sec, 
3.10  for  details  of  the  operation  of  the  Full-Text  Searcher. 

Upon  completion  of  the  Full-Text  Searcher  execution  the  Search 
Supervisor  prints  the  message: 

n  DOCUMENTS  POSTED  TO  THIS  SET 
and  then  constructs  a  Set  Handler  Table  describing   the   results   of 
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this   search   and  performs  the  Set  Handler  in  order  to  save  a  record 
of  this  search  for  future  use  by  the  user. 

All  that  remains  then  is  the  not  inconsiderable  task  of 
cleaning  up  by  freeing  all  the  temporary  lists  used  in  the  term 
expression  evaluation  and  freeing  all  the  disk  space  used  by  the 
Merger.  A  $RETN  trap  is  then  executed  to  return  control  to  the 
parser  routine  that  initiated  (via  a  $PRFM  trap)  the  Search 
Supervisor. 

Now  we  shall  look  at  the  special  cases.  If  the  Search 
Supervisor  has  been  performed  by  a  PRINT  Statement  then  no  term 
expression  exists  and  the  section  of  code  that  evaluates  this 
expression  must  be  skipped.  The  Search  Supervisor  must,  however, 
handle  print  requests  that  access  user  comments.  This  requires  a 
call  to  the  Set  Handler  to  get  the  list  of  all  documents  which  have 
user  comments  attached  to  them.  The  Search  Supervisor  must  also 
skip  over  the  call  to  the  Set  Handler  that  saves  the  search  results 
in  order  to  avoid  generating  a  new  guery  set  from  a  PRINT  statement. 

HAKE  Statements  present  a  similar  problem  in  that  they  have  no 
term  expression  to  be  evaluated,  but  this  is  easily  handled  by 
skipping  the  term  expression  evaluation  process. 
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The  last  major  "special  case"  to  consider  is  the  case  where  one 
or  more  search  terms  contain  no  alphanumeric  characters  and 
therefore  do  not  have  entries  in  the  index  file.  If  this  case 
arises,  the  Search  Supervisor  must  construct  a  set  list  consisting 
of  all  the  document  accession  numbers  in  the  entire  database  for 
this  term  and  force  full-text  searchinq  of  all  of  them. 

For  further  details,  refer  to  the  program  listing. 

li.1    SET  EXPRESSION  EVALUJTOR 

The  Set  Expression  Evaluator  (STEVL)  performs  the  function  of 
evaluating  the  set  expression  contained  in  the  "SET  QUADS"  and 
"SETS"  blocks  of  the  Search  Supervisor  Table  (see  Fig.  3.6.1.1; 
also  refer  to  the  preceeding  section  of  this  report  for  a 
description  of  the  table  layout).  The  address  of  the  Search 
Supervisor  Table  is  passed  to  this  routine  in  register  R0. 

Once  one  understands  the  structure  of  the  "SETS"  and  "SET 
QUADS"  sections  of  the  Search  Supervisor  Table  the  operation  of  the 
Set  Expression  Evaluator  is  self-evident.  It  traverses  the 
operation  tree  in  postorder,  using  the  Set  Handler  to  retrieve 
document  lists  (query  set  lists)  from  the  user*s  file  for  the  leaves 
and  the  Merger  (MERGE)  to  perform  the  operations  specified  at  the 
nodes.   The  final  result  is  put  in  the  area  of  the  Search  Supervisor 
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Table  labeled  "RESULTS". 


3.8  INDEX  AND  POSTINGS  HANDLER 


The  purpose  of  the  Index  and  postings  Handler  (IPHNDL)  module 
is  to  determine  in  which  documents  the  user's  search  terms  occur  so 
that  we  need  only  consider  those  documents  rather  than  searching  the 
entire  database  to  find  documents  that  satisfy  the  user's  search 
expression. 

iiii!  FILES  «ANII!I!LATED  BY  THE  INDEX  AND  POSTINGS  HAH2LJR 

Before  attempting  to  fathom  the  details  of  the  Index  and 
Postings  Handler,  let  us  consider  the  file  structures  manipulated  by 
it.  This  file  structure  consists  of  two  hash  tables,  HASH1  and 
HASH2  (read  in  from  files  HASH1.XXX  and  HASH2.XXX  by  IRINIT) ;  the 
index  file,  INDEX. XXX;  and  the  postings  file,  PSTNG.XXX.  The  "XXX" 
file  extension  on  the  file  name  is  used  to  distinguish  between 
various  versions  of  EUREKA  and  also  between  various  databases. 


The  two  hash  tables  are  used  to  get  a  disk  address  in  the  index 
file.  HASH1  is  used  to  hash  on  the  first  letter  of  a  token  and 
HASH2  is  used  to  hash  on  the  second  letter  of  the  token.  The  sum  of 
the  values  obtained  (via  a  process  explained  in  Sec.  3.8.2)  from 
HASH1  and  HASH2  is  used  to  give  a  disk  address   in   the   index   file 
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(INDEX)  where  terms  beginning  with  these  two  characters  are  indexed. 
This  section  of  the  index  file  is  then  searched  linearly  for  a  match 
to  the  entire  token  until  the  index  file  entry  is  lexicographically 
less  than  the  token.  If  a  match  is  found  a  pointer  into  the 
postings  file  (PSTNG)  is  obtained  from  the  index  file.  The  postings 
file  contains  the  list  of  all  documents  in  which  the  token  under 
consideration  occurs.  Each  entry  in  the  postings  file  contains  a 
document  accession  number,  context  bits  to  describe  the  context (s) 
in  which  the  token  occurs,  and  a  count  of  the  number  of  occurrences 
of  the  term  within  the  document. 

Let  us  now  look  at  the  layouts  of  the  files   and   tables   in   a 
semi-tabular  format. 

J-lil^i  HASH!  IABLE 

Table  name:  HASH1 

Size:       256  words 

Content:  Each  word  contains  full  word  value  to  hash  the  character 
which  indexes  it.  If  the  value  of  the  word  is  FFFP  base  16, 
then  the  character  does  not  exist  in  the  index. 
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3.8,  1.2  HASH2  TABLE 

Table  Name:  HASH2 

Size:        256  Bytes 

Content:     Each  byte  contains  a  byte  value  to  hash   the  character 

which  indexes  it.   If  the  value  of  the  byte  is  FF  base  16,  then 

the  character  does  not  exist  in  the  index. 
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3..8&JU.3  INDEX  FILE 
File  Name:  INDEX. XXX 
Type:       Contiguous 
Blocking  Factor:  2 
Format: 


Next    Block  Pointer 

Number   of    Types    in    this  Block           (N) 

.              _ 

Length    this  Type 

(n) 

,                            Type     (nodd   By 

tes) 

/ 

y 


a 


v  S- 


Directory      Address   of      Postings 


Offset    Into     Postings    Block 


# 


Occurs 
N     Times 


ft* 
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Jiiii-UJi    POSTINGS    FILE 
File    Name:    PSTNG.XXX 
Type:  Contiguous 

Format: 


2(i!    Words 
A 


Next   Block 
Pointer 


Total    Postings 
This  Type 


Postings 
This  Block 


Postings 


^V 


N 


J^- 


■\v 


nl 


N  >  n1  implies  postings  for  a  type  are  split  across  blocks.   In  this 
case,  the  next  block  has  the  following  format: 


Next    Block 
Pointer 

Postings 
This   Block 

Postings 

| \\ 1 

W 

Each  posting  consists  of  two  words  in  the  following  format: 


A 
U 

T 

\ 

D 
A 

Document     Number 

M 


PA 


M 


0 
T_ 


M 


COUNT 


M 


103 

1±S±Z   2EEIA1IO.N  QF  THE  INDEX  AND  POSTINGS  HANDLER 

Now  that  we  have  analyzed  the  files  manipulated  by  the  Index 
and  Postinqs  Handler,  let  us  look  at  the  operation  of  the  routine 
itself. 

Upon  entry,  register  HO  should  point  at  a  table  of  six  words 
containinq: 

1)  A  prefix/suffix  descriptor  for  the  term. 

2)  A  pointer  to  the  term. 

3)  A  pointer  to  where  the  postings  are  to  be  placed. 

4)  and  5)  A  two-word  context  descriptor. 
6)  A  pointer  to  the  Logon  Block. 

The  prefix/suffix  descriptor  contains  only  two  bits  of  useful 
information,  if  bit  2  is  on  (i.e.  eguals  1),  then  prefixinq  is  to 
be  used;  if  bit  3  is  on,  then  suffixing  is  to  be  used.  If  both 
bits  are  on,  then  both  prefixing  and  suffixing  are  to  be  used.  The 
pointer  to  the  term  points  to  a  term  in  the  "TERMS"  block  in  the 
following  format:  One  word  containing  the  length  in  characters 
(bytes)  of  the  term,  followed  immediately  by  the  text  of  the  term. 
The  context  descriptor  is  in  the  standard  form  shown  in  Sec. 
3.8.  1.4. 
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After  the  usual  housekeeping  the  Index  and  Postings  Handler 
first  checks  to  see  if  the  term  contains  any  non-alphanuraeric 
characters.  If  it  does,  then  full  text  searching  will  be  reguired 
of  all  documents  containing  as  a  substring  any  token  within  the 
term.  If,  as  is  normally  the  case,  there  are  no  special  characters 
in  the  term,  the  Index  and  Postings  Handler  checks  to  see  if 
prefixing  or  suffixing  has  been  specified  for  this  term  in  the 
descriptor  word.  If  either  or  both  have  been  specified  then  control 
is  passed  to  one  of  three  special  purpose  routines  (PREFIX,  SUFFIX, 
and  BOTH)  which  shall  be  described  later.  In  the  simplest  case 
(where  the  user  has  reguested  a  term  with  no  prefixing,  suffixing, 
or  special  characters)  the  Index  and  Postings  Handler  hashes  on  the 
first  two  characters  of  the  term  to  obtain  the  address  in  the  Index 
File  to  begin  searching  for  an  exact  match  to  the  search  term.  The 
hash  is  done  by  treating  the  bytes  containing  the  characters  being 
hashed  as  if  they  were  numeric  values.  The  first  character  is 
multiplied  by  two  (i.e.  shifted  left  one  tit)  and  is  used  as  an 
index  into  table  HASH1.  A  one  word  value  is  retrieved  from  the 
indexed  location  in  HASH1.  If  the  value  is  FFFF  base  16,  then  the 
character  doesn't  exist  in  the  index.  The  second  character  is  then 
treated  similarly,  retrieving  a  byte  value  from  HASH2  which  is  added 
to  the  word  value  retrieved  from  HASH1  (unless  it  is  FF  base  16,  the 
flag  that  a  character  is  non-existent)  to  obtain  a  disk   address   in 
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the  index  file  at  which  terms  beqinning  with  this  bigram  are 
located.  If  either  value  has  been  flagged  as  non-existent  the  Index 
and  Postings  handler  marks  this  fact  and  immediately  executes  a 
$RETN  trap.  If  a  valid  disk  address  has  been  obtained  by  the  hash, 
the  Index  and  Postings  Handler  uses  the  subroutine  GETNDX  to 
retrieve  the  index  listing  for  the  term  in  question.  Once  the  index 
entry  has  been  found  (if  it  exists)  the  Index  and  Postings  Handler 
uses  the  subroutine  GETPST  to  retrieve  the  postings  (list  of 
document  accession  numbers)  for  this  term  and  calls  the  Merger  to 
merge  this  list  with  any  previously  generated  lists  (this  occurs 
primarily  when  handling  special  cases  such  as  prefixing). 

Once  the  final  postings  list  has  been  constructed  it  is  read 
into  the  buffer  specified  in  the  six  word  descriptor  table.  If  the 
results  list  is  too  long  to  fit  in  the  buffer,  then  only  the  first 
block  is  read  in,  with  a  link  pointer  to  the  remainder  on  disk. 
After  some  fairly  messy  housekeeping  a  $RETN  trap  is  done. 

Thp  special  subroutines  SUFFIX,  PREFIX,  and  BOTH  handle  finding 
all  truncated  matches  for  terms  and  merging  the  postings  lists  of 
each  newly  found  posting  into  the  list  of  ones  already  found.  These 
routines  use  the  common  exit  routine  to  clean  up  after  execution  and 
do  the  SRETN  trap  that  returns  control  to  the  Search  Supervisor. 
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3.9  MERGER 

The  Merger  (MERGE)  is  used  to  perform  Boolean  operations  (AND, 
OR,  and  AND  NOT)  on  lists  of  document  accession  numbers.  A 
subsidiary  function  is  the  allocation/deallocation  of  scratch  disk 
space  for  itself  and  for  the  Search  Supervisor,  Index  and  Postings 
Handler,  and  the  Full-Text  Searcher.  Parameters  are  passed  to  this 
module  in  the  form  of  a  nine  word  long  parameter  list  containing: 

Word   0 operation  descriptor 

Word   1. ...pointer  to  left  hand  side  list 

Word   2 pointer  to  right  hand  side  list 

Word   3. ........  pointer  to  results  buffer 

Words  4&5 ..two-word  long  context  descriptor 

Word  6.. ........  pointer  to  left  hand  side  buffer 

Word  7 pointer  to  right  hand  side  buffer 

Word  8 ..........  disk  address  of  result  list  overflow. 

In  word  0,  the  operation  descriptor,  only   the   high-order   byte   is 
meaningful.   The  bits  of  this  byte  have  the  following  meanings: 
Bits  15,14  :  Binary  operation  code  as  described  in  Section  3.6.1. 
Bit   11     :  1  =>  Free  the  scratch  disk  space  whose  starting  address 

is   located   in   word   8   of   the  parameter  list  (bytes 

15,  16) . 
Bit   10     :  1  =>  Allocate   a   scratch   disk   space  and   place   the 

starting  address  in  word  8  of  the  parameter  list. 
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Words  1  and  2,  the  left  and  right  hand  side  list  pointers,  are 
the  starting  memory  addresses  of  the  two  document  accession  number 
lists  to  be  merged.  The  first  word  of  each  list  contains  the  number 
of  document  accession  numbers  (postings)  in  the  list,  while  the 
second  word  contains  the  number  of  postings  in  this  block.  These 
words  are  followed  by  the  document,  accession  list  in  the  standard 
two-word  long  descriptor  format  described  in  Sec.   3.8.1.4. 

Word  3  points  to  the  memory  buffer  in  which  to  store  the  result 
list.  This  buffer  is  only  one  block  long,  so  if  the  result  list  is 
over  one  block  long  only  the  first  block  of  the  list  is  stored  here. 
The  rest  of  the  list  is  stored  on  disk  as  a  one-way  linked  list  with 
the  starting  block  number  stored  in  word  8  of  the  parameter  list. 

Words  4  and  5  are  a  standard  context  descriptor  that  is  used 
for  setting  context  bits  in  entries  in  the  result  list  for  use  by 
the  Full-Text  Searcher. 

Words  6  and  7  are  pointers  to  the  head  of  the  buffer  used  by 
the  Index  and  Postings  Handler  for  reading  in  the  information  from 
the  Postings  file.  Words  1  and  2  are  addresses  within  this  buffer. 
The  buffer  addresses  are  provided  in  case  the  posting  spreads  across 
more  than  one  block. 
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Word  8  is  used  to  return  disk  addresses  of  result  list  overflow 
lists,  the  address  of  freshly  allocated  disk  space,  and  to  receive 
the  address  of  scratch  disk  buffers  to  be  deallocated. 

The  first  action  of  the  Merger  is  the  decoding  of  the  operation 
decriptor.  If  the  reguest  in  the  parameter  list  is  for  disk  buffer 
allocation/deallocation  then  subroutine  BMAP  is  called,  BMAP  is  a 
straightforward  bitmap  handler  which  keeps  track  of  which  disk 
buffers  are  in  use.  As  soon  as  BMAP  has  updated  its  bitmap  and 
placed  its  result  in  the  parameter  list  a  $RETN  trap  is  executed  to 
return  control  to  the  calling  routine. 

Boolean  operations  on  document  accession  number  lists  are 
performed  in  seperate  loops  (one  for  each  operation)  that  compare 
the  two  lists  on  an  element-by-element  basis,  generating  the  result 
list  with  correct  context  bits  set  as  it  does  so.  After  the  two 
lists  have  been  merged  into  a  result  list  a  $RETN  trap  is  executed, 
returning  control  to  the  calling  routine. 

It  is  hoped  that  this  routine  will  be  replaced  by  a  hardware 
merger  at  some  time  in  the  near  future. 
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The  Full-Text  Searcher  (PTSRCH)  module  does  all  the  full-text 
searching,  browsing,  and  text  display  for  the  user.  This  module  is 
■ade  up  of  three  main  routines:  PTSRCH,  the  full-text  searching 
routine;  BROISE,  the  brovse  mode  handler;  and  PRNTR,  the  document 
text  display  routine.  It  is  initiated  (via  a  SPRPH  trap)  by  the 
Search  Supervisor  and  receives  the  address  of  the  Search  Supervisor 
Table  in  register  RO. 

lilOjJ  PULkrlBXT  SEARCHING  ROUTINE  The  Full-Text  Searching  routine 
(FTSRCH)  is  called  during  each  search  after  the  Index  and  Postings 
Handler  has  constructed  a  list  of  all  documents  that  contain  the 
proper  Boolean  conjunction  of  search  terms.  The  Full-Text  Searcher 
sorts  this  list  of  documents  into  highest-count-field-first  seguence 
and  rewrites  it  to  disk.  If  the  user  has  reguested  that  all  text 
printed  for  this  guery  be  displayed  upon  the  line  printer  a  SLOCK 
trap  is  executed  to  lock  the  line  printer.  The  Full-Text  Searcher 
then  determines  what  type  of  search  (if  any)  is  to  be  performed  on 
documents.  Next  the  Pull-Text  Searcher  enters  a  loop  that  reads  in 
the  directory  for  each  document  in  the  list,  on*>  at  a  time.  Tests 
are  then  made  to  determine  if  any  type  of  full-text  search  or  text 
print  is  to  be  performed  on  this  document.  If  a  full-text  search  is 
to   be   performed,   control  is  passed  to  the  appropriate  controlling 


115 

loop  for  the  type  of  search  to  be  performed.  If  no  search  is 
required,  a  JSR  is  made  to  the  Text  Print  (PRNTR)  routine.  The 
search  controlling  loops  set  up  parameters  for  the  subroutine 
LEVEL1 ,  which  does  the  actual  searching.  Each  time  a  "hitM  is  found 
in  the  text  being  searched,  LEVEL1  returns  control  to  the  parent 
loop,  which  handles  coordinating  Boolean  conjunctions  of  terms 
within  contexts,  etc.  Whenever  some  text  that  satisfies  the  search 
request  is  found,  control  is  passed  to  the  Text  Printer  routine, 
which  formats  and  displays  the  text  if  a  print  has  been  specified. 

3, 10.?  TEXT  PRINTER  ROUTINE 

The  Text  Printer  routine  (PRNT)  does  all  text  display  for  the 
EUREKA  system.  It  is  called  by  the  Full-Text  Searcher  routine  and 
the  Browse  Mode  Handler.  If  the  user  has  requested  that  some 
term (s)  be  found  in  the  same  paragraph  and  that  the  sentence  in 
which  they  occur  be  printed;  or  that  they  be  found  in  the  same 
sentence  and  that  the  paragraph  in  which  they  occur  be  printed,  then 
this  routine  is  passed  the  starting  and  stopping  addresses  of  the 
clause  that  satisfies  the  Print  Clause.  Under  any  other  combination 
of  requests,  the  Text  Printer  gets  all  needed  information  from  the 
Search  Supervisor  Table. 
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The  Text  Printer  has  several  special  subroutines  used  for 
handlinq  the: 

"FIND  <Tera  Expression>  IN  SENTENCE  PRINT  PARAGRAPH" 
type  of  situations.   These  routines  utilize  "inside  knowledge"  about 
startinq   and   stopping   addresses   in   the   text,   etc.    to  set  un 
parameters  for  the  regular  formatting  and  marking  routines  and   th^n 
perform  them  just  as  the  main  body  of  the  Text  Printer  does. 

The  main  body  of  the  Text  Printer  first  goes  through  a  series 
of  tests  to  see  if  individual  contexts  are  to  be  printed.  On  each 
"hit",  the  appropriate  context  is  moved  into  th<*  parameter  areas  of 
the  workspace  and  the  PRINT1  subroutine,  which  handles  the  mechanics 
of  printing  one  context,  is  called.  For  some  contexts,  such  as  a 
SENTENCE  print  during  an  "IN  PARAGRAPH"  find,  the  special- pur  pose 
routines  described  before  must  be  called. 

The  PRINT1  subroutine  uses  the  subroutine  RDTXT  to  read  in  the 
text  containing  the  context  to  be  printed,  the  subroutine  MARK  to 
mark  the  text  to  be  displayed,  and  the  subroutine  FORMAT  to  actually 
format  and  display  the  marked  text.  Mark  does  little  more  than 
stick  a  special  fern  in  front  of  search  tprms  to  mark  them  for 
FORMAT  and  will  not  be  discussed  in  any  greater  detail.  FORMAT 
handles  the  mechanics  of  moving  the  text  to  be  displayed  into  the 
print   buffer,  deleting  ferns,  moving  asterisks  to  column  one  of  any 
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buffer  that  contains  a  mark  fern,  and  actually  displaying  the  text. 
After  the  text  has  been  displayed,  a  JSR  is  made  to  the  Browse  Mode 
Handler  (BROWSE),  unless  the  information  is  being  displayed  upon  the 
line  printer,  in  which  case  the  JSR  is  skipped  and  an  immediate  RTS 
is  done. 

IsJO-J  BROWSE  MODE  HANDLER 

The  Browse  Mode  Handler  (BROWSE)  controls  all  interaction  with 
the  user  while  text  display  is  in  progress.  It  prompts  the  user 
each  time  a  context  is  printed  and  examines  his  reply  to  see  if  any 
browse  commands  have  been  entered.  If  no  browse  mode  commands  have 
been  entered  an  immediate  RTS  is  done  to  return  control  to  the  Text 
Printer  routine.  If  the  user  has  entered  a  command  that  reguests 
printing  of  another  context  or  previous/succeding  sentence  or 
paragraph,  the  Browse  Mode  Handler  must  take  care  of  the  mechanics 
of  retrieving  the  text  to  be  displayed,  setting  up  parameters  for 
the  subroutine  FORMAT,  and  calling  it  to  have  the  text  printed.  If 
the  user  has  entered  a  comment  string  to  be  attached  to  the  document 
currently  being  viewed  the  Browse  Mode  Handler  must  build  a  Set 
Handler  Table  containing  the  Logon  Block  pointer,  document  accession 
number  (with  bit  15  set  to  1  to  flag  it  as  a  document),  and  pointer 
to  the  comment  string.  The  Browse  Mode  Handler  then  performs  the 
Set  Handler  and  then  re-prompts  the  user  for  another  command. 
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iill  §ET  INFORMATION  PRINTER 

The  Set  Information  Printer  (INPOPT)  module  is  used  to  retrieve 
information  from  the  user's  personal  file  and  display  it.  Its 
function,  therefore,  is  primarily  that  of  calling  the  Set  Handler  to 
retrieve  information  from  the  user's  file,  format  it,  and  display  it 
upon  either  the  user's  terminal  or  upon  the  line  printer. 

The  first  operation  of  the  Set  Information  Printer  is  some 
housekeeping  which  includes  workspace  allocation,  line  buffer  header 
initialization  [1],  locking  (via  a  $LOCK  trap)  the  line  printer  if 
necessary,  and  various  buffer  initialization.  The  Set  Information 
Printer  then  determines  whether  the  information  to  be  printed  is  a 
macro  or  a  guery  set  and  proceeds  to  the  proper  section  of  the 
module.   He  shall  first  look  at  the  case  of  a  user  macro  print. 

In  the  case  of  a  user  macro  print  the  Set  Information  Printer 
first  checks  to  see  if  the  macro  identifier  in  the  Set  Handler  Table 
is  "ALL".  If  it  is,  the  user  has  reguested  that  all  macro 
definitions  in  the  user's  file  be  displayed.  In  this  case  the  user 
must  first  reguest  a  list  of  all  the  macro  identifiers  in  the  user's 
directory  from  the  Set  Handler.  Once  it  has  this  list  it  goes  into 
a  loop  which  moves  one  macro  identifier  at  a  time  into  the  Set 
Handler  Table  and  JSR's  into  the  display  subroutine  once  for  each 
macro.   If  the  identifier  is  not  "ALL",  the  user   has   reguested   to 
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see  a  single  macro  definition  and  the  Set  Information  Printer  needs 
only  perforin  (via  a  JSP)  the  display  subroutine  once  and  then  exit 
by  executing  a  $RETN  trap. 

The  guery  set  print  section  of  the  nodule  works  in  much  the 
same  fashion  as  the  macro  print.  All  guery  set  print  requests  are 
assumed  to  be  of  the  form: 

PRINT  <Query  ID  1>  TO  <Query  ID  2> 
This  form  is  reflected  in  the  use  of  a  modified  Set  Handler  Table 
for  passing  parameters  to  the  Set  Information  Printer  module.  This 
table  looks  like  a  Type  I  Set  Handler  Table  with  a  second  Query  Set 
Descriptor  (describing  <Query  ID  2>)  following  the  first.  If  the 
user  has  reguested  the  display  of  a  single  guery,  the  second  Query 
Set  Decriptor  is  zeroed.  If  the  user  has  specified  either  of  the 
<Query  Set  ID>'s  via  a  mnemonic  name  the  Query  Set  Information 
Printer  reguests  the  guery  number  of  that  guery  set  from  the  Set 
Handler  in  order  to  use  it, as  a  bound  on  the  printing  loop.  Once 
the  Query  Set  Information  Printer  has  both  query  numbers  it  clears 
the  set  name  field  of  the  Set  Handler  Table,  moves  the  lower  of  the 
two  query  numbers  to  the  guery  number  field  of  the  table,  and 
subtracts  one  from  the  query  number  field.  It  then  enters  a  loop 
that: 

1)  Adds  one  to  the  query  number  field  of  the  Set  Handler  Table; 

2)  Compares  the  query   number   to   the   upper   bound   of   the   print 
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request:   if  it  is  less  than  or  equal  to  the  upper  bound  it  performs 

the  display  subroutine  once  and  loops  back  to  (1);  if  it  is  greater 

than   the   upper  bound  the  Set  Information  Printer  jumps  to  the  exit 
routine. 

The  display  subroutine  (PRINT)  is  referenced  by  both  the  macro 
print  section  and  the  guery  set  print  section.  It  receives  as  input 
a  Type  I  Set  Handler  Table  containing  the  correct  identifiers/buffer 
pointers  to  read  in  the  information  to  be  printed.  The  display 
routine  calls  the  Set  Handler  once  for  each  pertinent  block  of 
information  to  be  retrieved  (query/macro  text,  query  set  list, 
comments).  When  the  Set  Handler  returns  control  to  the  display 
routine  it  formats  the  data  for  display,  prints  it  wherevpr  the  user 
has  requested  that  it  be  printed,  and  executes  an  RTS  instrution  to 
return  control  to  the  controlling  loop.  If  the  Set  Handler  has 
returned  an  error  messaqe  of  "INVALID  SET  ID"  on  the  stacK,  the 
display  routine  merely  clears  the  stack  and  does  an  RTS,  returning 
control  to  either  the  macro  print  loop  or  the  query  set  print  loop, 
thus  discardinq  the  error  messaqe.  All  other  error  raessaqes  are 
propaqated  back  up  the  tree  of  tasks.  This  allows  the  Query  Set 
Information  Printer  to  handle  cases  where  the  user  requests  to  see 
all  query  sets  between  <Query  Number  L>  and  <Query  Number  N>  where 
some  <Query  number  fl>,  L  <  M  <N  has  been  deleted. 


121 

The  exit  routine  of  the  Information  Print  routine  checks  to  see 
if  the  line  printer  has  been  locked  and,  if  so,  unlocks  it  by  doing 
a  SONT.K  trap.  Following  this,  a  $RETN  trap  is  executed  to  return 
control  to  the  PRINT  Statement  Parser. 
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APPENDIX  A 
Descriptions  of  Context  Terras 

The  following  are  the  current  definitions  of  the   context 
terms: 

SENTENCE any  text  between  two  periods  (.). 

PARAGRAPH any   text   appearing    between    two 

paragraph  ferns. 
COMMENTS user  written   and   assigned   comment 

strings. 
REFERENCES any  references  listed  by  the   author 

of  the  paper. 

NOTES not  used  at  present. 

TEXT.... abstr  ac  t ,  bo  d  y,  notes,  and  references. 

KEYS not  used  at  present. 

INDEX............. not  used  at  present. 

MISC ...junk. 

PAGES.-..-.. journal   paae    numbers    on    which 

article  occurred. 

DATE- date  written  for  dated  papers. 

SOURCE journal  from  which  taken. 

TITLE title  of  document. 

AUTHOR author  of  document. 

DATA includes   author,    title,    source, 

date,  pages,  and  misc. 
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ARTICLE-. everything. 

DOCUMENT • same  as  article. 

BODY • text  of  article  excluding  abstract. 

ABSTRACT ...abstract  of  article. 

Obviously  not  all  of  the  contexts  are  applicable   in  all 
cases,  and  in  fact  are  presently  somewhat  sparsely  filled  in. 
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APPENDIX  B 
Error  Messages 


INVALID  COMMAND: 

EUREKA  didn't  recognize  the  first  word  of   your   command. 

check  for  spellinq  or  abbreviation  errors. 
SET  NAME  >  10  CHARACTERS: 

You  have  attempted  to  attach   a   name   with   11   or   more 

letters   to   a   query  set.   This  is  not  permitted.   Hatch 

for  concatenated  words  on  multiple  line  commands. 
ILLEGAL  OR  IMMORAL  USE  OF  QUOTES: 

Check  for  unmatched  single  quotes,  i.e.   •  .   Also   check 

for  sinqle  quotes  used  in  improper  places. 
INVALID  WORD  FOLLOWS  "ALL": 

EUREKA  cannot  parse  the  part  of  your  command   that   comes 

after  the  word  "all". 
CANNOT  DELETE  DOCUMENT: 

EUREKA  will  not  allow  you  to  delete  a  document,  only   the 

comments  attached  to  the  document. 
ILLEGAL  USE  OF  BRACKETS: 

EUREKA  has  discovered   some   brackets   where   it   doesn't 

expect  them. 
OUERY  OR  DOCUMENT  NO.   TOO  BIG: 
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You  have  just  input  a  query/document  number  that   is   far 

biqqer   than  the  number  assigned  to  any  query  or  document 

in  the  EUREKA  system. 
MISSING  SET  NAME  IN  CHANGE  STMT.: 

EUREKA  cannot  find  the  name  of  the  query  set  you  wish   to 

have  chanqed. 
MISSING  KEYWORD  OR  SETNAME  IN  CHANGE  STATEMENT: 

EUREKA  cannot  find  "TO"  in  your  CHANGE  statement.    Check 

to  see  if  you  have  two  Set  Names  separated  by  "  TO  ". 
TOO  MANY  SET  NAMES: 

EUREKA  is  confused.   There  are  too  many  names  there. 
QUERY  OR  DOCUMENT  NO.   IS  NOT  NUMERIC: 

EUREKA  has  found  a   strinq   of   letters   in   a   place   it 

expected   only  numbers.   Be  sure  you  haven't  typed  an  "0" 

instead  of  an  "0". 
NO  SET  NAME  IN  DELETE  STATEMENT: 

Did  you  tell  EUREKA  what  you  wanted  deleted?   It   doesn^t 

think  you  did! 
ILLEGAL  USE  OF  BRACKETS: 

Check  the  usage  of  brackets  ("["  or  "]")  for  validity. 
NAME  OF  DOCUMENT  CANNOT  BE  CHANGED: 

You  may  not   assiqn   a   name   to   a   document.    A   close 

approximation   is   to   use  the  make  statement  to  create  a 

query  set  with  only  the  desired  document  in  the  set. 
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INVALID  CHARACTERS  IN  SET  NAME: 

A  Set  Name  cannot  contain  any  symbols  other  than   letters 

of   the  alphabet  or  numbers;   the  first  letter  of  the  Set 

Name  must  be  non-numeric. 
MISSING  SET  NAME: 

You've  left  out  the  name  of  a  Set  somewhere. 
LOGICAL  END  OF  STATEMENT  REACHED  PREMATURELY: 

EUREKA  thinks   you   started   something   that   you   didn't 

finish.   Check  your  brackets,  etc. 
PARSER  TABLE  OVERFLOW: 

Proqram  couldn't  handle   your   expression   -   please   use 

smaller  queries  to  achieve  your  final  result. 
ILLEGAL  SUCESSOR: 

You  probably  have  either  l°ft  out  an   operator   (*,♦#   or 

-) ,  or  have  two  of  them  with  nothinq  in  betweeen  them. 
ILLEGAL  CONTEXT: 

EUREKA  doesn't  recognize  the  context  you  want  it  to   use. 

Check  your  spellinq  or  abbreviation. 
ILLEGAL  SET  NAME: 

Check  to  see  if  you  have  used  a  reserved  word  or   invalid 

characters. 
TOO  MANY  LEVELS  OF  PARENTESES: 

Your  expression  is  too  complicated.    Break   it   up   into 

several  FIND/MAKE  statements. 
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TOO  MANY  TERMS  OR  PARENTHESES: 

Sane  as  previous  aessage. 
ILLEGAL  TERM: 

Host  likely  explanation  is  that  you  have  too  many  or   too 

few  single  quotes  (•)  in  your  search  expression. 
TOO  MANY  DISJUNCTIONS: 

Expression  too  complicated.   Use  several  smaller  queries. 
ILLEGAL  SET  OR  DOCUMENT  NO.: 

The  number  is  either  non-numeric  or  too  big. 
EXPRESSION  TOO  COMPLICATED: 

Use  several  smaller  queries   and/or   MAKE   statements   to 

accomplish  your  grand  design. 
INVALID  USER  ID: 

Check  your  spelling.   If  you  cannot  log  on,  see  a  systems 

programmer. 
INVALID  BITMAP: 

Some  of  your  records  may  be  missing.    Notify   a   systems 

programmer. 
INVALID  BOOLEAN  CONJUNCTION  OF  SETNAME-#«S: 

You've  probably  left  out   or   put   in   an   extra   Boolean 

operator  (*,♦,-). 
YOU  HAVE  ATTEMPTED  TO  RENAHE  A  NON-EXISTENT  SET: 

Self  explanatory.   Check  your  spelling. 
YOU  HAVE  ATTEMPTED  TO  READ  A  NON-EXISTENT  SET: 
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Same  as  previous  error  message. 
YOU  HAVE  ATTEMPTED  TO  COMMENT  A  NON-EXISTENT  SET: 

Same  as  previous  error  message. 
YOU  HAVE  RUN  OUT  OF  DISK  SPACE.   PLEASE  DELETE  SOMETHING: 

Your  disk  space  is  completely  full  -  EUREKA   cannot   find 

enough   space   to   finish   processing   your  last  command. 

Delete  some  sets  and/or  comments  before  proceeding. 
SET  EXPRESSION  NOT  VALID: 

You  have  probably  left  out  a  Boolean  operator  (*,♦,-)   in 

your  set  expression. 
NO  "LAST"  SET  EXISTS: 

You  have  attempted  to  access  the  "LAST"   set,   which   has 

been  deleted  or  somehow  changed. 
ILLEGAL  MACRO  NAME: 

Macro  names  must  obey  the  same  regulations  as   guery   set 

names. 
YOU  MUST  LOGON!: 

No  gueries  may  be  entered  until  you  have  logged  on. 
SYSTEM  ERROR: 

Program  error  -  call  system  programmer  immediately. 
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